forcedotcom / SiteCrawler

This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic scaling (depending on available machine power (CPU, RAM) and network capacity) out of the box. It also has a Plugin structure, which allows others to write code (plugins) that act on …
22Updated 3 years ago

Alternatives and similar repositories for SiteCrawler:

Users that are interested in SiteCrawler are comparing it to the libraries listed below