tjake / stormscraperLinks
A Storm based web crawler with Cassandra backend
☆28Updated 11 years ago
Alternatives and similar repositories for stormscraper
Users that are interested in stormscraper are comparing it to the libraries listed below
Sorting:
- VoltDB Click Stream Processing Example.☆16Updated 7 years ago
- [Deprecated] Simple docker image to run an Elasticsearch server☆22Updated 8 years ago
- A chef cookbook for deploying spark☆30Updated 12 years ago
- An Elasticsearch Plugin that notifies about changes to indices☆92Updated 9 years ago
- Deprecated - Check out MemSQL Pipelines instead!☆8Updated 8 years ago
- Data Science Research Architecture, Data Center OS☆21Updated 9 years ago
- A nozzle to spray a kafka topic at an HTTP endpoint. This project is deprecated and not maintained.☆49Updated 5 years ago
- Storm Spout + Kafka State Inspector☆58Updated 5 years ago
- Big GeoSpatial Data Points Visualization Tool☆19Updated 9 years ago
- Firefly is a web application aimed at powerful, flexible time series graphing for web developers.☆172Updated 4 years ago
- Parse wikipedia dumps and index (some) page data to elasticsearch☆49Updated 9 years ago
- Stand-alone ANSI SQL for Cascading on Apache Hadoop☆48Updated 7 years ago
- A collection of efficient utilities for a data scientist.☆41Updated 10 years ago
- Sample custom Nifi processor to process tcpdump☆18Updated 9 years ago
- Docker image for Consul ElasticSearch☆12Updated 10 years ago
- Tail a log file and send log lines automatically to a kafka topic☆57Updated 13 years ago
- juttle execution engine☆36Updated 9 years ago
- Muppet☆127Updated 4 years ago
- Turn-key deployments of DC/OS on AWS (template and onprem), Azure, and GCE☆14Updated last year
- CSV river for ElasticSearch☆91Updated 8 years ago
- A set of components designed to retrieve data from third-party APIs and storage systems, and to pass that data in to a DataSift account.☆9Updated 7 years ago
- Docker containers for Druid nodes☆27Updated 9 years ago
- Graph Analytics Engine☆260Updated 10 years ago
- Exelixi is a distributed framework based on Apache Mesos, mostly implemented in Python using gevent for high-performance concurrency. It …☆133Updated 11 years ago
- Kerberos, LDAP, Active Directory, PKI/SSL/TLS and host/ip based ACL coarse-grained and document level security for elasticsearch (Authent…☆170Updated 5 years ago
- Nutch with Cassandra and Elasticsearch on Docker☆17Updated 3 years ago
- Light-weight monitoring for DCOS☆9Updated 9 years ago
- docker image with graphite-api and graphite-influxdb☆39Updated 8 years ago
- A javascript shell for elasticsearch☆105Updated 10 years ago
- A Seriously Fun guide to Big Data Analytics in Practice☆169Updated 10 years ago