apache / incubator-stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm
☆879Updated this week
Related projects: ⓘ
- Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.☆411Updated last year
- A set of reusable Java components that implement functionality common to any web crawler☆233Updated last month
- Apache Nutch is an extensible and scalable web crawler☆2,886Updated this week
- Elassandra = Elasticsearch + Apache Cassandra☆1,714Updated 5 months ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆181Updated this week
- A scalable frontier for web crawlers☆1,291Updated last year
- Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning☆1,786Updated 3 years ago
- Mirror of Apache Samza☆811Updated 3 weeks ago
- Open-source Enterprise Grade Search Engine Software☆499Updated 2 years ago
- A java library for stored queries☆373Updated last year
- Distributed Big Data Orchestration Service☆1,708Updated this week
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆446Updated 9 months ago
- Netflix's distributed Data Pipeline☆794Updated last year
- A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…☆2,214Updated last week
- Compact in-memory representation of directed graph data☆560Updated last year
- A platform for visualization and real-time monitoring of data workflows☆1,181Updated 4 years ago
- Banana for Solr - A Port of Kibana☆669Updated last month
- Distributed object store☆1,740Updated this week
- Distributed Graph Database☆5,248Updated last year
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆268Updated last year
- Cassandra Java Client☆1,037Updated 6 months ago
- Carrot2: Text Clustering Algorithms and Applications☆764Updated last week
- Apache OpenNLP☆1,425Updated last week
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,177Updated 10 months ago
- Client library for collecting metrics.☆741Updated this week
- Elasticsearch real-time search and analytics natively integrated with Hadoop☆1,928Updated last week
- A Java library to detect and normalize URLs in text☆782Updated last year
- Streaming MapReduce with Scalding and Storm☆2,139Updated 2 years ago
- Work in progress transmit from Google Code☆1,107Updated 6 years ago