commoncrawl / commoncrawl-crawler
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
☆214Updated last year
Related projects: ⓘ
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 8 years ago
- Elasticsearch Index Termlist☆117Updated 5 years ago
- A text tagger based on Lucene / Solr, using FST technology☆173Updated 9 months ago
- distributed realtime searchable database☆115Updated 10 years ago
- Custom graph algorithms for Neo4j with own Java and REST APIs☆34Updated 8 years ago
- Distributed processing framework for search solutions☆81Updated last year
- Solr Dictionary Annotator (Microservice for Spark)☆70Updated 4 years ago
- Educational Examle of a custom Lucene Query & Scorer☆48Updated 4 years ago
- ☆28Updated 8 years ago
- SIREn - Semi-Structured Information Retrieval Engine☆106Updated 3 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆281Updated 6 years ago
- WARC (Web Archive) Input and Output Formats for Hadoop☆35Updated 9 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆233Updated last month
- Skywalker for Elasticsearch is like Luke for Lucene☆79Updated 4 years ago
- Storm / Solr Integration☆19Updated 7 months ago
- Mirror of Apache Blur☆33Updated 5 years ago
- ☆106Updated this week
- Repackaging of Boilerpipe published on Maven Central Repository.☆53Updated 9 months ago
- Building recommenders with Elastic Graph!☆37Updated 4 years ago
- Using latent Dirichlet allocation (LDA) in Apache Lucene☆58Updated 11 years ago
- Sample code, data, and configuration for the book☆188Updated 3 years ago
- NLP tools developed by Emory University.☆60Updated 8 years ago
- Elasticsearch plugin for b-bit minhash algorism☆62Updated 3 months ago
- Apache OpenNLP Sandbox☆42Updated this week
- command line tool for Apache Lucene☆161Updated last month
- Mirror of Apache Stanbol (incubating)☆112Updated 6 months ago
- The next generation of open source search☆90Updated 7 years ago
- Lucene Auto Phrase TokenFilter implementation☆59Updated 6 years ago
- Fusion demo app searching open-source project data from the Apache Software Foundation☆42Updated 5 years ago
- Fabric-based framework for deploying and managing SolrCloud clusters in the cloud.☆90Updated 5 years ago