apache / stormcrawlerLinks
A scalable, mature and versatile web crawler based on Apache Storm
☆957Updated last week
Alternatives and similar repositories for stormcrawler
Users that are interested in stormcrawler are comparing it to the libraries listed below
Sorting:
- Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.☆418Updated 2 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆251Updated this week
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆196Updated 3 weeks ago
- A scalable frontier for web crawlers☆1,323Updated 7 months ago
- Apache Nutch is an extensible and scalable web crawler☆3,115Updated this week
- Banana for Solr - A Port of Kibana☆672Updated 5 months ago
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆276Updated 3 years ago
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,229Updated 2 years ago
- Open-source Enterprise Grade Search Engine Software☆512Updated 3 years ago
- Carrot2: Text Clustering Algorithms and Applications☆843Updated last week
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆223Updated 3 years ago
- ACHE is a web crawler for domain-specific search.☆477Updated 4 months ago
- Carrot2 plugin for ElasticSearch☆294Updated 3 years ago
- Work in progress transmit from Google Code☆1,126Updated 8 years ago
- A scrapy pipeline which send items to Elastic Search server☆322Updated 3 years ago
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆3,164Updated 2 weeks ago
- Crawljax☆536Updated 2 years ago
- Mirror of Apache Samza☆838Updated 8 months ago
- Language Detection Library for Java☆585Updated 3 years ago
- Apache OpenNLP☆1,577Updated this week
- The Apache Gora open source framework provides an in-memory data model and persistence for big data.☆121Updated last year
- Mapper Attachments Type plugin for Elasticsearch☆504Updated 2 years ago
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆447Updated 4 months ago
- A text tagger based on Lucene / Solr, using FST technology☆177Updated 2 years ago
- A plugin for language detection in Elasticsearch using Nakatani Shuyo's language detector☆252Updated 8 years ago
- Readability clone in Java☆460Updated 5 years ago
- Elassandra = Elasticsearch + Apache Cassandra☆1,719Updated 7 months ago
- A Java library to detect and normalize URLs in text☆783Updated 6 months ago
- Just the facts -- web page content extraction☆1,279Updated 6 months ago
- Netflix's distributed Data Pipeline☆796Updated 2 years ago