RovoMe / JIRLbot
Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in their paper "IRLbot: Scaling to 6 Billion Pages and Beyond"
☆17Updated 7 years ago
Related projects ⓘ
Alternatives and complementary repositories for JIRLbot
- Lucene Directory implementation for AWS S3☆39Updated last month
- API definition, resources and reference implementation of URL Frontiers☆46Updated this week
- The LAW next generation crawler.☆86Updated 3 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆212Updated last year
- XML/Document DB on top of distributed cache☆41Updated 5 years ago
- The JSON database for REST and Websocket storage☆42Updated 10 years ago
- Asynchronous search makes it possible for users to run queries in the background, allowing users to track the progress, and retrieve par…☆23Updated 3 years ago
- Java port of a concurrent trie hash map implementation from the Scala collections library☆27Updated 4 months ago
- Schema and type system for creating sortable byte[]☆48Updated 11 years ago
- HTML parser and tag balancer.☆14Updated 7 months ago
- Berkeley DB Java Edition☆50Updated 3 years ago
- ☆13Updated 2 months ago
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆56Updated 3 years ago
- A Java library capable of constructing character-sequence-storing, directed acyclic graphs of minimal size☆43Updated 11 years ago
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 7 years ago
- Benchmark of open source, embedded, memory-mapped, key-value stores available from Java (JMH)☆140Updated last year
- SOLR bulk indexing utility for the command line.☆45Updated 3 months ago
- Production-ready Java implementation of the Xor Filter.☆17Updated 4 years ago
- BBoxDB is a scalable, highly available, and distributed data store for multi-dimensional big data. The software supports operations like …☆54Updated 6 months ago
- Unofficial mirror of HSQLDB (https://hsqldb.org/), namely HyperSQL Database. It is a relational database management system and a set of t…☆73Updated this week
- Java imap nio client that is designed to scale well for thousands of connections per machine and reduce contention when using large numbe…☆57Updated last year
- B+-tree in java that stores to disk using memory mapped files, supports range queries and duplicate keys☆46Updated 2 weeks ago
- A hashmap implementation for Java that stores map entries off-heap☆70Updated 4 years ago
- A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data wareho…☆56Updated 10 months ago
- Apache Lucene Microservice☆15Updated last year
- Master repository for the JHeaps project☆47Updated 3 years ago
- Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts☆80Updated 6 years ago
- A distributed in-memory key-value storage for billions of small objects.☆23Updated 5 years ago
- Apache Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a…☆95Updated last year