dansandland / cassandra-scrapy-pipeline
☆15Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for cassandra-scrapy-pipeline
- a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine☆96Updated 7 months ago
- High Level Kafka Scanner☆19Updated 7 years ago
- An extension of the kafka-python package that adds features like multiprocess consumers.☆38Updated last year
- A javascript shell for elasticsearch☆105Updated 9 years ago
- dllib is a distributed deep learning library running on Apache Spark☆32Updated 7 years ago
- Open source analytics platform powered by Apache Cassandra, Spark, and Kafka☆34Updated 9 years ago
- Spark Application : Spark Summit 2018 : Streaming Trend Discovery☆11Updated 6 years ago
- Scrapy extension which writes crawled items to Kafka☆30Updated 6 years ago
- A platform for real-time streaming search☆103Updated 8 years ago
- HopsYARN Tensorflow Framework.☆33Updated 5 years ago
- A cookiecutter template for Apache Spark applications written in Scala☆10Updated 5 years ago
- ☆14Updated 8 years ago
- Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…☆55Updated 5 months ago
- Natural Language Processing with Spark's MLlib☆62Updated 7 years ago
- Deprecated - Check out MemSQL Pipelines instead!☆8Updated 7 years ago
- PySpark for Elastic Search☆55Updated 7 years ago
- A collection of datasets and databases☆24Updated 6 years ago
- Find which links on a web page are pagination links☆29Updated 7 years ago
- Provides a Pythonic interface for reading and writing Avro schemas☆26Updated 2 years ago
- Readability/Boilerpipe extraction in Python☆55Updated 8 years ago
- This is an introduction of Apache Spark DataFrames.☆41Updated 9 years ago
- Data Pipeline Clientlib provides an interface to tail and publish to data pipeline topics.☆109Updated 2 years ago
- ☆32Updated 10 months ago
- Analyze the structure and dynamics of an open source project's developer community, using graph algorithms, etc.☆57Updated 3 years ago