public marks

PUBLIC MARKS with tag corpus

2010

2009

2008

INIST au 19ème Festival International de Géographie - Institut de l’Information Scientifique et Technique

by parmentierf
Développée par le service Veille de l’INIST/CNRS, l’application NIPPOGEO permet une consultation dynamique de corpus hétérogènes : - 764 notices bibliographiques issues de la Bibliographie Géographique Internationale - BGI, le domaine Géographie de la base FRANCIS de l’INIST / CNRS, - et 185 images (plaques de verres, diapositives et photographies numériques) fournies par les chercheurs de PRODIG. Accessible librement sur Internet, NIPPOGEO offre non seulement un accès à des données bibliographiques et bibliométriques aux spécialistes du domaine, mais participe également à la diffusion de l’information scientifique auprès du grand public.

DLFP: JeuxDeMots : un jeu en ligne pour produire des données lexicales libres

by parmentierf (via)
Outre son aspect ludique, l'intérêt de JeuxDeMots réside dans le fait qu'il produit un réseau lexical en fonction des réponses données par les joueurs

Benoît Sagot - WOLF

by parmentierf
Le WOLF (Wordnet Libre du Français) est une ressource lexicale sémantique (wordnet) libre pour le français.

2007

Donate your speech to VoxForge using your telephone

by kmaclean
VoxForge ( http://www.voxforge.org ) is a open source project that collects speech recordings for use in the creation of Acoustic Models. Speech recognition engines need an acoustic model to recognize speech. To create an acoustic model, you take a very large number of speech audio recordings and 'compile' them into statistical representations of the sounds that make up each word. Most open source speech recognition engines use 'closed source' acoustic models. VoxForge hopes to address this problem by creating a free gpl speech corpus, and generating acoustic models from this corpus. You can now use your telephone to your donate your speech. Click this link: http://www.voxforge.org/home/s… to get the number, and the Interactive Voice Response system will guide you through the process.

Europeana

by parmentierf & 15 others (via)
Europeana est un prototype de bibliothèque en ligne développé par la Bibliothèque nationale de France, dans le cadre du projet de Bibliothèque numérique européenne. Europeana rassemble environ 12 000 documents libres de droits issus des collections de la BnF, de la Bibliothèque Nationale Széchényi de Hongrie et de la Bibliothèque nationale du Portugal.

Home - voxforge.org

by ogrisel
VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines. We will categorize and make available all submitted audio files under the GPL license, and then 'compile' them into Acoustic Models for use with Open Source Speech Recognition engines such as Sphinx, ISIP, HTK, and Julius.

Improving Open Source Speech Recognition

by kmaclean
Speech Recognition Engines require two types of files to recognize speech: an Acoustic Model, created by 'compiling' a lots of transcribed speech into statistical models, and a Language Model (for Dictation) or Grammar file (for Command and Control). Most Acoustic Models used by 'Open Source' Speech Recognition engines are 'Closed Source'. They do not give you access to the speech audio (the 'Source') used to create the Acoustic Model. The reason for this is that there is no free Speech Corpus in a form that can readily be used to create Acoustic Models for Speech Recognition Engines. Open Source projects are thus required to purchase a Speech Corpus which has restrictive licensing in order to create their Acoustic Models. VoxForge (http://www.voxforge.org) was set up to address this problem. The site collects GPL transcribed speech audio from users which is then used to create Acoustic Models. These can then be used with Free and Open Source Speech Recognition Engines such as Sphinx, ISIP, Julius and/or HTK.

dbpedia.org - Using Wikipedia as a Web Database

by parmentierf & 7 others
dbpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. dbpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.

2006

Official Google Research Blog: All Our N-gram are Belong to You

by parmentierf & 1 other (via)
Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of one trillion words from public Web pages.

2005

Textes en accès libre

by parmentierf & 4 others (via)
Liens vers des sites qui offrent des textes gratuits!

start [WaCky]

by parmentierf (via)
The WaCky Project is a nascent effort (I always liked the expression nascent effort) by a group of linguists to build or gather tools to use the web as a linguistic corpus.

2004

Natural Language Toolkit

by parmentierf (via)
The Natural Language Toolkit is a suite of Python packages and data for natural language processing; it comes with extensive API documentation and tutorials. NLTK-Lite is the version under active development.

Active users

sammyfisherjr
last mark : 16/12/2010 20:37

parmentierf
last mark : 08/01/2009 10:20

kmaclean
last mark : 26/04/2007 17:13

ogrisel
last mark : 20/03/2007 09:12

adrpater
last mark : 14/12/2006 23:44

gavrie
last mark : 17/05/2005 10:16

gygyuu
last mark : 13/02/2005 22:06