RovoMe / JIRLbot
Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in their paper "IRLbot: Scaling to 6 Billion Pages and Beyond"
☆16Updated 7 years ago
Alternatives and similar repositories for JIRLbot:
Users that are interested in JIRLbot are comparing it to the libraries listed below
- The LAW next generation crawler.☆87Updated 3 years ago
- API definition, resources and reference implementation of URL Frontiers☆47Updated 2 weeks ago
- Lucene Directory implementation for AWS S3☆41Updated 3 months ago
- WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing,…☆110Updated 2 years ago
- Production-ready Java implementation of the Xor Filter.☆17Updated 5 years ago
- ☆9Updated 4 years ago
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆56Updated 3 years ago
- Common Crawl Index Server☆65Updated last week
- The JSON database for REST and Websocket storage☆42Updated 10 years ago
- Common web archive utility code.☆52Updated last month
- A Distributed Java Reverse Proxy☆25Updated this week
- Smart and autonomous cache in a redis module☆18Updated 2 years ago
- Yet another JanusGraph, Cassandra/Scylla and Elasticsearch in Docker Compose setup☆59Updated 4 years ago
- Java imap nio client that is designed to scale well for thousands of connections per machine and reduce contention when using large numbe…☆58Updated last year
- Distributed Messaging Framework based on Netty, Apache Ignite, gRPC☆13Updated 6 years ago
- An Example Dremio ARP driven connector that supports SQLLite