Guide to the sample web scrapers
Web-watch-keywords
This sample scans websites periodically for keyword occurrences. The user specifies :
- the URLs to watch
- the keywords to watch for
- the timer start time
- the timer interval
This sample outputs a web page (HTML file). It can be modified to produce other file formats or database updates.
Installs with DTBuild.
Web-extract-number
This sample is configured to scrape stock quotes, but can be modified (easily) to scrape labeled numbers from other web pages by changing the Label and the input URL list.
Installs with DTBuild.
Web-extract-title-header
Extracts two items from each web page in the input URL list :
- the content of the web page's <title> tag
- the content of the first header tag as defined in the HeaderTag string set
The "rules" in the HeaderTag list determine which headers are extracted. These rules work well for the news headlines sites specified in the sample input URL list, but will probably require modification for other sites.
Installs with DTBuild.
Census-01
U.S. Census Bureau table scraper: state populations + areas
Extracts selected fields from a single table on the U.S. Census Bureau website and puts the extracted data in a database.
To run this sample :
- download and install DTBuild, if you haven't already
- download and install the Census-01 sample
- open the Census-01 sample (MS Access will start)
- press the Grab button
- wait for the scan to complete
- press the Output button to view the results
See the HTML table parser example for more information.
Installs separately. Requires MS Access 2000 or later.
Census-02
U.S. Census Bureau table scraper: zip code data
Extracts selected fields from multiple zip code tables on the U.S. Census Bureau website. Just specify the zip codes you're interested in and press the Grab button.
To run this sample :
- download and install DTBuild, if you haven't already
- download and install the Census-02 sample
- open the Census-02 sample (MS Access will start)
- press the Zip Codes button and enter the zip codes
- press the Grab button
- wait for the scan to complete
- press the Output button to view the results
This sample demonstrates the use of SQL to convert the user-entered zip codes into a source URL table which is scanned to produce the desired output.
Installs separately. Requires MS Access 2000 or later.