Sponsorised links
November 2008
PHP Simple HTML DOM Parser
- A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
- Require PHP 5+.
- Supports invalid HTML.
- Find tags on an HTML page with selectors just like jQuery.
- Extract contents from HTML in a single line.
PHP Simple HTML DOM Parser
October 2008
Sponsorised links
August 2008
HTML5 parser integrated in W3C Markup Validator from olivier Thereaux on 2008-08-25 (www-validator@w3.org from August 2008)
Aeracode :: Graphs, Python and CSS
I could only find one python CSS library, cssutils, and while that seemed to have very decent CSS2 support for parsing into a document tree, I couldn't see any immediate way of using it for retrieving the applicable values for, say, a grid object with class "minor" inside a wavegraph object.
firefox mozilla/parser/htmlparser/src/nsHTMLTokenizer.cpp
The Performance Cost of the HTML Tree Builder
Xerces is faster. Namespaces are worse than the much-maligned HTML “extra fix-ups” (21% hit vs. 7% hit). An XML parser can be slow.
June 2008
May 2008
Parser Generators
John Resig - Pure JavaScript HTML Parser
html5 parsing difficile à implémenter ?(I also contemplated porting the HTML 5 parser, wholesale, but that seemed like a herculean effort.)
April 2008
Ian Bicking: a blog :: Python HTML Parser Performance
a performance comparison of several parsers and document models. The situation is a little complex because there’s different steps in handling HTML: 1. Parse the HTML 2. Parse it into something (a document object) 3. Serialize it
March 2008
Messages in a bottle » Blog Archive » Grune and Jacobs, Parsing Techniques, Second Edition
The second edition of Parsing techniques: A practical guide, by Dick Grune and Ceriel J. H. Jacobs, has now appeared.
February 2008
PHP Simple HTML DOM Parser
PHP Simple HTML DOM Parser
January 2008
librdfa - a pure C RDFa parser from Manu Sporny on 2008-01-31 (public-rdf-in-xhtml-tf@w3.org from January 2008)
librdfa is a pure C implementation of a standards-compliant RDFa parser. The library is quite easy to use (there are only 5 functions). librdfa is stream-based, very small and quite fast.
