RovoMe / JIRLbotLinks
Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in their paper "IRLbot: Scaling to 6 Billion Pages and Beyond"
☆16Updated 8 years ago
Alternatives and similar repositories for JIRLbot
Users that are interested in JIRLbot are comparing it to the libraries listed below
Sorting:
- The LAW next generation crawler.☆87Updated 3 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆244Updated last week
- Ingest processor doing language detection for fields☆72Updated 2 years ago
- Text retrieval database based on simhash similarity search☆24Updated 2 years ago
- A Java library capable of constructing character-sequence-storing, directed acyclic graphs of minimal size☆42Updated 12 years ago
- A high performance "thin wrapper" HTTP REST server on top of Apache Lucene☆143Updated last year
- Migrate Redis data from source to destination☆9Updated 5 years ago
- Mirror of Apache OpenNLP Add-ons☆17Updated last week
- B+-tree in java that stores to disk using memory mapped files, supports range queries and duplicate keys☆47Updated 2 weeks ago
- Query preprocessor for Java-based search engines (Querqy Core and Solr implementation)☆184Updated last month
- Solr Redis Extensions☆53Updated last year
- Elasticsearch plugin for b-bit minhash algorism☆63Updated last year
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆191Updated this week
- Alternative Java platform, built from the ground up - with its own async I/O core and DI. Ultra high-performance, simple and minimalistic…☆100Updated 2 years ago
- Apache OpenNLP Sandbox☆43Updated this week
- Pure Java implementations of Murmur hash algorithms☆75Updated 2 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆216Updated 2 years ago
- Welcome to the ArangoDB Careers repository! These are the current open positions at ArangoDB. If you want to join us on this great journ…☆10Updated 3 years ago
- Fastest word count in Java☆17Updated 9 years ago
- Benchmark of open source, embedded, memory-mapped, key-value stores available from Java (JMH)☆142Updated 2 years ago
- Distributed processing framework for search solutions☆81Updated 2 years ago
- XML/Document DB on top of distributed cache☆41Updated 6 years ago
- Apache NLPCraft - API to convert natural language into actions.☆82Updated last month
- ☆18Updated 2 months ago
- Production-ready Java implementation of the Xor Filter.☆17Updated 5 years ago
- The next generation of open source search☆92Updated 8 years ago
- The JSON database for REST and Websocket storage☆41Updated 10 years ago
- World's fastest CSV parser / databinding for Java☆16Updated last week
- Java library that sorts very large files of records by splitting into smaller sorted files and merging☆87Updated 2 weeks ago
- Zulia Search Engine☆33Updated this week