Cargill / pipewrench
Data pipeline automation tool
☆25Updated 8 months ago
Related projects: ⓘ
- Flink Examples☆39Updated 8 years ago
- Schema Registry integration for Apache Spark☆39Updated last year
- Starter project for building MemSQL Streamliner Pipelines☆32Updated 7 years ago
- Cascading on Apache Flink®☆54Updated 7 months ago
- Spark structured streaming with Kafka data source and writing to Cassandra☆64Updated 4 years ago
- functionstest☆33Updated 7 years ago
- Spark to Tableau Extractor library☆18Updated 6 years ago
- A set of utilities to help with management of Streamsets pipelines.☆13Updated 6 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 6 months ago
- Example project to show how to use Spark to read and write Avro/Parquet files☆50Updated 11 years ago
- A small project to show how to add lineage to Atlas when using Spark as ETL tool☆12Updated 7 years ago
- Spark cloud integration: tests, cloud committers and more☆19Updated 6 months ago
- phData Pulse application log aggregation and monitoring☆13Updated 4 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 3 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Updated 8 years ago
- Ansible playbook for automated HDP 2.x deployment install with Kerberos☆19Updated 8 years ago
- Ambari and Cloudera Manager in Docker☆22Updated 5 years ago
- Quark is a data virtualization engine over analytic databases.☆98Updated 7 years ago
- Enabling Spark Optimization through Cross-stack Monitoring and Visualization☆47Updated 7 years ago
- A rough prototype of a tool for discovering Apache Hive schemas from JSON documents.☆42Updated 9 months ago
- ☆48Updated 6 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Updated 2 years ago
- Random implementation notes☆33Updated 11 years ago
- Hadoop Data Pipeline using Falcon☆15Updated 8 years ago
- Demo for Kafka Connect with JDBC and HDFS Connectors☆0Updated 3 months ago
- Recipes and examples for Apache Spark☆13Updated 9 years ago
- A utility for generating Oozie workflows from a YAML definition☆48Updated 5 years ago
- Spooker is a dynamic framework for processing high volume data streams via processing pipelines☆29Updated 8 years ago
- ☆20Updated this week