RovoMe / JIRLbotLinks
Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in their paper "IRLbot: Scaling to 6 Billion Pages and Beyond"
☆16Updated 8 years ago
Alternatives and similar repositories for JIRLbot
Users that are interested in JIRLbot are comparing it to the libraries listed below
Sorting:
- The LAW next generation crawler.☆88Updated 3 years ago
- API definition, resources and reference implementation of URL Frontiers☆52Updated last month
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆194Updated this week
- An elasticsearch plugin to create hierarchical aggregations☆51Updated 3 weeks ago
- A set of reusable Java components that implement functionality common to any web crawler☆246Updated 2 weeks ago
- Ingest processor doing language detection for fields☆72Updated 2 years ago
- A scalable, mature and versatile web crawler based on Apache Storm☆930Updated this week
- Query preprocessor for Java-based search engines (Querqy Core and Lucene implementation)☆187Updated this week
- A high performance "thin wrapper" HTTP REST server on top of Apache Lucene☆144Updated last year
- A text tagger based on Lucene / Solr, using FST technology☆177Updated last year
- Entity resolution for Elasticsearch.☆162Updated 7 months ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆73Updated last year
- This is a Fact based Question Answering System using Apache Solr as backend search engine, Wikipedia dumps as information source, Apache …☆26Updated last month
- Stemmer for German☆45Updated 3 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆219Updated 2 years ago
- An Elasticsearch plugin to aggregate Geo Points in clusters.☆60Updated last month
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆274Updated 2 years ago
- Carrot2 plugin for ElasticSearch☆291Updated 2 years ago
- Various utility scripts for running Lucene performance tests☆218Updated this week
- Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.☆388Updated 3 weeks ago
- Zulia Search Engine☆33Updated this week
- SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm☆18Updated 10 years ago
- Lucene Directory implementation for AWS S3☆44Updated 6 months ago
- Github mirror of "search/extra" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for c…☆55Updated last month
- A simple tutorial of Lucene for LIS 501 Introduction to Text Mining students at the University of Wisconsin-Madison (Fall 2021).☆75Updated last year
- Benchmark of open source, embedded, memory-mapped, key-value stores available from Java (JMH)☆142Updated 2 years ago
- Starter Reverse Proxy Configuration for Solr☆47Updated 10 years ago
- Standalone versions of LUCENE_5205 and other patches: SpanQueryParser, Concordance and Co-occurrence stats☆18Updated 4 years ago
- Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts☆81Updated 7 years ago
- Mirror of Apache OpenNLP Add-ons☆17Updated 2 weeks ago