newfront / odsc-west-streaming-trendsLinks
All Data, Relevant Information, Scripts, and Applications for the Open Data Science Conference (2018)
☆11Updated 6 years ago
Alternatives and similar repositories for odsc-west-streaming-trends
Users that are interested in odsc-west-streaming-trends are comparing it to the libraries listed below
Sorting:
- Utilities for writing tests that use Apache Spark.☆24Updated 6 years ago
- Tools for Hadoop☆25Updated 13 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Airflow workflow management platform chef cookbook.☆71Updated 6 years ago
- Random implementation notes☆34Updated 12 years ago
- Avro Schema Shredder is a REST API that enables storage of Avro Schemas in Apache Atlas. This API enables an organization to use Apache A…☆13Updated 8 years ago
- Apache Airflow CI pipeline☆19Updated 6 years ago
- ☆22Updated 6 years ago
- An example PySpark project with pytest☆16Updated 7 years ago
- Provides a Pythonic interface for reading and writing Avro schemas☆27Updated 2 years ago
- Service for automatically managing and cleaning up unreferenced data☆46Updated last week
- InsightEdge Core☆20Updated 3 weeks ago
- ☕⛵WIP PySpark dependency management☆22Updated 7 years ago
- Tools to deploy Hadoop on EMC Isilon☆17Updated 8 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Updated 4 years ago
- Sneaking interactivity into static SQL statements☆19Updated 5 years ago
- An application that records stats about consumer group offset commits and reports them as prometheus metrics☆14Updated 6 years ago
- Reproducing Distributed Systems and Experiments on Cloud☆40Updated last year
- type-class based data cleansing library for Apache Spark SQL☆78Updated 6 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 5 years ago
- 💻 CLI for reporting events to Faros platform☆14Updated 2 months ago
- Puppet module to provision Airbnb's Airflow☆19Updated 3 years ago
- functionstest☆33Updated 8 years ago
- phData Pulse application log aggregation and monitoring☆13Updated 5 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.☆24Updated 9 years ago
- Use cases built on SnappyData. Use cases contained here: 1. Ad Analytics 2. Streaming data ingestion from RabbitMQ.☆32Updated 2 years ago
- Data pipeline automation tool☆26Updated last year
- ETLy is an add-on dashboard service on top of Apache Airflow.☆69Updated last year
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆51Updated 3 weeks ago