PUBLIC   marks

PUBLIC MARKS with tag parser

Sponsorised links

This month

October 2009

Whatpm::HTML - An HTML Parser and Serializer

by karlcow

Whatpm::HTML - An HTML Parser and Serializer

rdfa_parser | gemcutter | awesome gem hosting

by karlcow

Yields each triple, or generate in-memory graph

pyparsing

by karlcow
The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. With pyparsing, you don't need to learn a new syntax for defining grammars or matching expressions - the parsing module provides a library of classes that you use to construct the grammar directly in Python.

ONLamp.com: Building Recursive Descent Parsers with Python

by karlcow

What is "parsing"? Parsing is processing a series of symbols to extract their meaning. Typically, this means reading the words of a sentence and drawing information from them. When application programs need to process data that is provided as text, they must use some form of parsing logic. This logic scans the text characters and character groups (words) and recognizes patterns of groups to extract the underlying commands or information.

Snowball

by karlcow & 1 other

Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. This site describes Snowball, and presents several useful stemmers which have been implemented using it.

Sponsorised links

August 2009

Character encoding detection for external scripts

by karlcow

This is (EF BB BF) C3 B6 3D 22 21 22 loaded into browsers under various labels. That happens to be properly formed ECMAScript code for all the encodings used. The bogus results for Opera9 can easily be reproduced in context of the testing script, but probably not individually from a clean cache; what's going on there is unknown. I also noted in running these tests that Opera claims "Opera supports the entire ECMA-262 2nd and 3rd standards with no exceptions" while in fact their implementation does not, the parser rejects code that follows the IdentifierStart :: UnicodeEscapeSequence production of ECMA-262 section 7.6. Instead it implements Opera-only extensions, like comma-free arrays ala [ 1 2 3 ]. Other fun facts include: IE does not implement onload for iframes and cannot modify the innerHTML or tr elements; Firefox ignores "tags" when setting the innerHTML of dynamically created tr elements with no ownerElement... Oh and Opera again needs /th "tags" so it won't nest adjacent th elements when setting innerHTML.

RDFa Fragment Parser

by karlcow

Paste a chunk of XHTML RDFa below, and click "Parse."

make sure you do the right thing for RDFa validation when you eventually place this chunk inside a web page

July 2009

Sparkles everywhere, CubicWeb gets fizzy (CubicWeb's Forge)

by karlcow

Fyzz parses the SPARQL query and generates something we decided to call an AST although it's still a bit rough for now. Fyzz understands simple triples, distincts, limits, offsets and other basic functionalities.

John Resig - HTML 5 Parsing

by karlcow

If you're interested in giving the new parser a try (it's doubtful that you'll see many obvious changes - but any help in hunting down bugs would be appreciated) you can download a nightly of Firefox, open about:config, and set html5.enable to true.

May 2009

Python Package Index : pyWxSVG 0.1

by karlcow

View and print svg file or svg content, convert svg to raster graphics. Partial support svg format. Tested with Python 2.5 and wxPython 2.8.9.2. Drawing use wx.GraphicsContext class. Path parser from Enable - SVGPathParser class.

March 2009

RFC (2)822 & 3696 Email Address Parser in PHP

by karlcow

The test suite shows results for each parser, based on these test definitions. These are borrowed from Dominic Sayers who has a similar parser. We are still arguing over certain tests ;)

February 2009

Les parsers HTML5 - La Tortue Cynique / The Cynical Turtle

by karlcow

Bref, on a donc besoin d'un parser spécifique (après 30 ans à travailler avec des parsers génériques GML et SGML),

January 2009

November 2008

PHP Simple HTML DOM Parser

by srcmax & 7 others , 3 comments
  • A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
  • Require PHP 5+.
  • Supports invalid HTML.
  • Find tags on an HTML page with selectors just like jQuery.
  • Extract contents from HTML in a single line.

PHP Simple HTML DOM Parser

by Spone & 7 others , 3 comments
# A HTML DOM parser written in PHP5 let you manipulate HTML in a very easy way! # Require PHP 5 . # Supports invalid HTML. # Find tags on an HTML page with selectors just like jQuery. # Extract contents from HTML in a single line.

October 2008

PUBLIC TAGS related to tag parser

no tag

Sponsorised links