Renien / ETL-Starter-Kit
Extract, Transform, Load (ETL) refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
☆21Updated 7 years ago
Related projects ⓘ
Alternatives and complementary repositories for ETL-Starter-Kit
- Flink Examples☆39Updated 8 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆41Updated 7 years ago
- ☆47Updated 4 years ago
- Apache Spark ETL Utilities☆40Updated 2 weeks ago
- Enabling Spark Optimization through Cross-stack Monitoring and Visualization☆47Updated 7 years ago
- Spark structured streaming with Kafka data source and writing to Cassandra☆64Updated 4 years ago
- Sample processing code using Spark 2.1+ and Scala☆51Updated 4 years ago
- Schema Registry integration for Apache Spark☆39Updated last year
- Example project to show how to use Spark to read and write Avro/Parquet files☆50Updated 11 years ago
- High performance HBase / Spark SQL engine☆28Updated 2 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Updated 2 years ago
- Utilities for writing tests that use Apache Spark.☆24Updated 5 years ago
- Apache Flink™ training material website☆79Updated 4 years ago
- This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.☆41Updated 8 years ago
- functionstest☆33Updated 8 years ago
- A Real-Time Analytical Processing (RTAP) example using Spark/Shark☆51Updated 10 years ago
- Helpful user defined fuctions / table generating functions for Hive☆101Updated 8 years ago
- Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm☆105Updated 9 months ago
- A small project to show how to add lineage to Atlas when using Spark as ETL tool☆12Updated 7 years ago
- Spark with Scala example projects☆33Updated 5 years ago
- ☆48Updated 6 years ago
- A sink to save Spark Structured Streaming DataFrame into Hive table☆23Updated 6 years ago
- PMML evaluator library for the Apache Hive data warehouse software (legacy codebase)☆13Updated 9 years ago
- UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy☆59Updated 10 months ago
- ☆54Updated 10 years ago
- Custom state store providers for Apache Spark☆93Updated 2 years ago
- This tutorial provides a quick introduction to using Spark☆57Updated 8 years ago
- Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or what…☆96Updated 4 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 8 months ago