RovoMe / JIRLbotLinks
Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in their paper "IRLbot: Scaling to 6 Billion Pages and Beyond"
☆16Updated 8 years ago
Alternatives and similar repositories for JIRLbot
Users that are interested in JIRLbot are comparing it to the libraries listed below
Sorting:
- The LAW next generation crawler.☆87Updated 3 years ago
- API definition, resources and reference implementation of URL Frontiers☆48Updated last month
- Common web archive utility code.☆55Updated 2 weeks ago
- command line tool for Apache Lucene☆162Updated 2 months ago
- Solr Redis Extensions☆53Updated last year
- Mirror of Apache James jdkim☆22Updated 2 weeks ago
- SOLR bulk indexing utility for the command line.☆44Updated 2 months ago
- Asynchronous search makes it possible for users to run queries in the background, allowing users to track the progress, and retrieve par…☆23Updated 4 years ago
- Migrate Redis data from source to destination☆9Updated 4 years ago
- Redis search and indexing in Java☆15Updated 8 years ago
- Highly performant, lightweight framework for linked data processing. Supports RDFa, JSON-LD, RDF/XML and plain text formats, runs on Andr…☆52Updated 2 years ago
- Benchmarks for the RediSearch module☆44Updated 2 years ago
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 8 years ago
- An elasticsearch plugin to create hierarchical aggregations☆51Updated 2 months ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆216Updated 2 years ago
- Lucene Directory implementation for AWS S3☆41Updated 3 months ago
- ☆17Updated 10 years ago
- Java imap nio client that is designed to scale well for thousands of connections per machine and reduce contention when using large numbe…☆60Updated 3 months ago
- Yuvi is an in-memory storage engine for recent time series metrics data.☆48Updated 7 years ago
- An Elasticsearch plugin for rescoring based on Redis keys☆30Updated 3 years ago
- A high performance "thin wrapper" HTTP REST server on top of Apache Lucene☆143Updated last year
- Browser-driven explorer for lucene indexes☆74Updated 3 years ago
- Querqy for Elasticsearch☆46Updated last month
- ☆29Updated 2 weeks ago
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆56Updated 4 years ago
- Welcome to the ArangoDB Careers repository! These are the current open positions at ArangoDB. If you want to join us on this great journ…☆10Updated 3 years ago
- Ingest processor doing language detection for fields☆72Updated 2 years ago
- Github chatbot and web-content indexer/searcher☆25Updated 4 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆188Updated this week
- A hashmap implementation for Java that stores map entries off-heap☆70Updated 5 years ago