Blogmarks.net is a social bookmarking service.

Founded in late 2003 and heartquartered in France, we are non-profit and independant.

We believe in the open web, think internet services should be sustainable, build for the long term.

While we are re-launching the service, we only accept new members through invitation.

PUBLIC MARKS from parmentierf with tags corpus & texte

06 September 2006 07:30

Official Google Research Blog: All Our N-gram are Belong to You

by 1 other (via)

Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of one trillion words from public Web pages.

web texte google corpus taln

1 (1 marks)

Blogmarks.net is a social bookmarking service.

Already an user?

PUBLIC MARKS from parmentierf with tags corpus & texte

06 September 2006 07:30

Official Google Research Blog: All Our N-gram are Belong to You

parmentierf's TAGS related to tag corpus