dansandland / cassandra-scrapy-pipeline
☆15Updated 9 years ago
Alternatives and similar repositories for cassandra-scrapy-pipeline:
Users that are interested in cassandra-scrapy-pipeline are comparing it to the libraries listed below
- a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine☆97Updated 11 months ago
- High Level Kafka Scanner☆19Updated 7 years ago
- This is a simple streaming application that utilises Kafka and Python☆45Updated 6 years ago
- Docker compose files for various kafka stacks☆32Updated 7 years ago
- Find which links on a web page are pagination links☆29Updated 8 years ago
- Send summary messages of your Luigi jobs to Slack☆46Updated 5 years ago
- Natural Language Processing with Spark's MLlib☆62Updated 7 years ago
- Open source analytics platform powered by Apache Cassandra, Spark, and Kafka☆34Updated 9 years ago
- ☆23Updated 7 years ago
- PySpark for Elastic Search☆55Updated 8 years ago
- A highly configurable Google Cloud Dataflow pipeline that writes data into Google Big Query table from Pub/Sub☆67Updated 6 years ago
- dllib is a distributed deep learning library running on Apache Spark☆32Updated 7 years ago
- A curated list of awesome Apache Spark packages and resources.☆40Updated 8 years ago
- Computes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.☆26Updated 10 years ago
- Scrapy extension which writes crawled items to Kafka☆30Updated 6 years ago
- A DC/OS time series demo☆62Updated 9 years ago
- A platform for real-time streaming search☆103Updated 9 years ago
- A collection of datasets and databases☆24Updated 6 years ago
- Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…☆55Updated 10 months ago
- Analyze the structure and dynamics of an open source project's developer community, using graph algorithms, etc.☆58Updated 4 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 8 years ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆11Updated 9 years ago
- Luigi Plugin for Hubot☆35Updated 8 years ago
- docker scrapyd scrapy boot2docker crawler - a spider Python application that can be "Dockerized".☆42Updated 9 years ago
- Let's perform Twitter sentiment analysis using Python, Docker, Elasticsearch, and Kibana!☆137Updated 4 years ago
- A javascript shell for elasticsearch☆105Updated 9 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated 10 months ago
- ☆32Updated last year
- 🌆 TouristFriend API lets you query Google Places, Yelp and Foursquare at the same time, with Bayesian rankings!☆29Updated 6 years ago