| Other examples: |
|---|
| Web number scraper |
| Web text scraper |
| Text search |
| Text search and replace |
| Line break conversion |
| Email parser |
| Email word pair search |
| Syntax checker |
This parser first searches the web page for one of the entries in the "TableLocator" string set. This allows skipping forward to the table to be scraped.
The parser then skips three table rows (the 3 TR nodes with no actions), and starts collecting data at the fourth table row (TR).
For each remaining row in the table it parses the first five table data (TD) tags :
The numeric fields are "cleaned up" with the Extract-Digits node group. Extract-Digits just removes commas for now, so the fields can be loaded into numeric database fields. Extract-Digits can be extended to remove other non-numeric characters.
This parser is simplified by the use of several other node groups :
| TR | - HTML table row parser |
|---|---|
| TD | - HTML table cell parser |
| HTML-element | - general-purpose HTML element parser |
| HTML-entity | - HTML entity parser |
This parser is included as part of the DT Census table scraper sample. It can be adapted to parse HTML tables with any number of columns.
| Home | Download | Register | Tutorial | Help | Site index | Contact info |