public marks

PUBLIC MARKS from parmentierf with tags taln & web

2008 Blog: Query interfaces for the Semantic Web

An interesting presentiation at Google Tech Talks about different interfaces to query semantic data. Casual users were presented with 4 increasingly formal systems: keyword search, natural language search, controlled language search, and a graphical interface to build query patterns. Interestingly enough, the users liked natural language best, although keyword queries gave more accurate results.

uClassify - free text classifier web service

uClassify is a free web service where you can easily create your own text classifiers. Examples * Spam filter * Web page categorization * Automatic e-mail support * Language detection * Written text gender recognition * Sentiment * See below for some examples So what do you want to classify on? Only your imagination is the limit!

Home | OpenCalais

by 2 others
We want to make all the world's content more accessible, interoperable and valuable. Some call it Web 2.0, Web 3.0, the Semantic Web or the Giant Global Graph - we call our piece of it Calais. Calais is a rapidly growing toolkit of capabilities that allow you to readily incorporate state-of-the-art semantic functionality within your blog, content management system, website or application.

Cypher - Beta Release — - ai - ai software - semantic web - semantic web software - ai company - natural language processing - natural language processing software - RDF, FOAF, Friend of a Friend, DC, Dublin Core, RSS, SeRQL and SPARQL softwa

The Cypher™ beta release is the AI software program available which generates the .rdf (RDF graph) and .serql (SeRQL query) representation of a plain language input, allowing users to speak plain language to update and query databases. With robust definition languages, Cypher's grammar and lexicon can quickly and easily be extended to process highly complex sentences and phrases of any natural language, and can cover any vocabulary. Equipped with Cypher, programmers can now begin building next generation semantic web applications that harness what is already the most widely used tool known to man - natural language.



Official Google Research Blog: All Our N-gram are Belong to You

by 1 other (via)
Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of one trillion words from public Web pages.