dansandland / cassandra-scrapy-pipelineLinks
☆15Updated 9 years ago
Alternatives and similar repositories for cassandra-scrapy-pipeline
Users that are interested in cassandra-scrapy-pipeline are comparing it to the libraries listed below
Sorting:
- A platform for real-time streaming search☆102Updated 9 years ago
- Code reference from my Qbox blog posts.☆87Updated 10 years ago
- a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine☆97Updated last year
- Natural Language Processing with Spark's MLlib☆62Updated 8 years ago
- Load a CSV (or TSV) file into an Elasticsearch instance☆62Updated 3 years ago
- Scrapy extension which writes crawled items to Kafka☆30Updated 7 years ago
- A javascript shell for elasticsearch☆106Updated 10 years ago
- Let's perform Twitter sentiment analysis using Python, Docker, Elasticsearch, and Kibana!☆138Updated 5 years ago
- Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…☆55Updated last year
- PySpark for Elastic Search☆55Updated 8 years ago
- Find which links on a web page are pagination links☆29Updated 8 years ago
- Traptor -- A distributed Twitter feed☆26Updated 3 years ago
- This is a simple streaming application that utilises Kafka and Python☆46Updated 6 years ago
- An extension of the kafka-python package that adds features like multiprocess consumers.☆39Updated 2 years ago
- Airflow workflow management platform chef cookbook.☆71Updated 6 years ago
- Deprecated. Formerly: scripts to make it easier to set up and manipulate clusters at Amazon EC2☆110Updated 13 years ago
- Automatic Item List Extraction☆87Updated 9 years ago
- This repository holds some python libraries and plugins designed to be used with MemSQL.☆62Updated 2 years ago
- Text classification using Naive Bayes and Elasticsearch☆152Updated 9 years ago
- Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python☆246Updated 2 years ago
- Docker compose files for various kafka stacks☆32Updated 7 years ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆11Updated 10 years ago
- Few things we've met during our etl project based on spark☆24Updated 7 years ago
- An extended version of the official Elasticsearch Python client.☆63Updated 10 years ago
- Send summary messages of your Luigi jobs to Slack☆46Updated 6 years ago
- Data analysis tool.☆85Updated 2 years ago
- Sample repo for luigi tasks & config☆36Updated 9 years ago
- Data Pipeline Clientlib provides an interface to tail and publish to data pipeline topics.☆110Updated 3 years ago
- A guide for setting up Spark + PySpark under Ubuntu linux☆56Updated 8 years ago
- Slack notifications for the Luigi workflow manager☆46Updated 4 years ago