forcedotcom / SiteCrawlerLinks
This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic scaling (depending on available machine power (CPU, RAM) and network capacity) out of the box. It also has a Plugin structure, which allows others to write code (plugins) that act on …
☆23Updated 3 weeks ago
Alternatives and similar repositories for SiteCrawler
Users that are interested in SiteCrawler are comparing it to the libraries listed below
Sorting:
- Mirror of Apache Cocoon☆28Updated last week
- Data abstraction, storage, discovery, and serving system☆32Updated 2 months ago
- Detect memory leaks in minutes without a heap dump.☆17Updated 8 years ago
- Simplified scalable aggregation and processing framework built upon Apache Camel.☆22Updated 6 years ago
- PredictionIO E-Commerce Recommendation Engine Template (Java-based parallelized engine)☆39Updated 6 years ago
- Lucene plugin for indexing and searching files stored in Baratine distributed filesystem☆16Updated 9 years ago
- Splot for Java: An Experimental IoT Machine-to-Machine Library for Monitoring, Control, and Automation☆16Updated 4 years ago
- Mirror of Apache MetaModel Membrane☆16Updated 6 years ago
- ☆10Updated 7 years ago
- Herd-MDL, a turnkey managed data lake in the cloud. See https://finraos.github.io/herd-mdl/ for more information.☆16Updated 10 months ago
- Script Execution service☆12Updated 8 years ago
- Mirror of Apache Infrastructure Puppet Kitchen☆9Updated 6 years ago
- OLD Produces the UI bundle used by the Couchbase documentation site.☆12Updated 4 years ago
- Fork of swagger-api/swagger-parser☆17Updated 5 years ago
- Mirror of Apache Geronimo☆37Updated last year
- Core API for Silverpeas☆50Updated last week
- Optimized & enhanced end-user oriented web performance testing & beaconing (RUM) library☆39Updated 3 years ago
- ☆9Updated 9 years ago
- Very basic web app project that grabs a twitter stream and runs it through Stanfords Core NLP☆10Updated 9 years ago
- Cloudfier is a model-driven tool for rapid development of business applications☆22Updated 3 months ago
- Plivo Java helper Library☆35Updated last month
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated 4 months ago
- littleMock is an easy-to-install demo environment for Java performance tuning. Runs on Windows/Mac/Linux.☆10Updated 6 years ago
- Talend Component Kit (implementation repository)☆33Updated this week
- A secure proxy service for managing OneOps secrets.☆13Updated last year
- Interactive shell for Hadoop☆21Updated 3 years ago
- Windows installer for Groovy☆12Updated 3 years ago
- Secure REST service to index, search, retrieve and aggregate content from heterogeneous sources.☆20Updated 8 months ago
- Apache Maven JDeps Plugin☆13Updated last month
- Distributed processing framework for search solutions☆81Updated 2 years ago