public marks

PUBLIC MARKS with tag webcrawler

August 2006

Crawl-By-Example

by dcancel
Crawl-By-Example project is improving crawler ability to find useful and interesting pages, a plugin to the Heritrix crawler.

Ariel

by dcancel
a library that allows you to extract information from semi-structured documents (such as websites). Ariel will use a small number of labeled examples to generate and learn effective extraction rules.

RDig - Ferret based full text search for web sites

by dcancel
RDig provides an HTTP crawler and content extraction utilities to help building a site search for web sites or intranets. Internally, Ferret is used for the full text indexing.

June 2006

搜索引擎蜘蛛及Robots详解_SearchWeb

by jackiege
下面一个小工具专门检查robots.txt文件的有效http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

May 2006

SEO第一篇 - URL中的问号 '?'-Yee

by jackiege & 1 other
webcredible.co.uk 上的这篇文章《Does a question mark in the URL affect ranking? 》。该文作者将google和yahoo对动态链接的收录等级作成了张坐标图.从表中可以看出。动态URL(带问号的)在google的优先等级非常

February 2006

Heritrix

by dcancel
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project

December 2005

PUBLIC TAGS related to tag webcrawler

adsense +   Algorithm +   delicious +   google +   msn +   opensource +   ruby +   rubyonrails +   screenscraping +   se +   search +   SearchEngine +   sem +   seo +   seranking +   software +   thearchive +   yahoo +   搜索引擎 +  

Active users

dcancel
last mark : 23/08/2006 00:36

jackiege
last mark : 19/06/2006 00:35

rabbittom
last mark : 17/12/2005 04:08