Last updated: May, 6th 2007
download: source(shell script)
This is a very simple web crawler application that follows links (hrefs) in html pages which after parsing, it then registers how many votes the page (and domain ) gets (it discounts domain votes). You can set the depth and width of the crawler in the config.py file, where you also set the start pages. All information is kept in memory (it is not meant for crawling large number of sites).
You can start it with:
python crawler.py
It creates a crawler.log which level you can set in config.py