Multi-stage, config driven, SQL based ETL framework using PySpark
☆26Sep 16, 2019Updated 6 years ago
Alternatives and similar repositories for spark-sql-etl-framework
Users that are interested in spark-sql-etl-framework are comparing it to the libraries listed below
Sorting:
- ☆16Apr 9, 2019Updated 6 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Oct 8, 2025Updated 4 months ago
- Various data stream/batch process demo with Apache Scala Spark 🚀☆11Feb 28, 2020Updated 6 years ago
- Set of ETL utils for Spark☆15May 4, 2020Updated 5 years ago
- Implement a complete data warehouse etl using spark SQL☆14Sep 8, 2022Updated 3 years ago
- Data validation library for PySpark 3.0.0☆33Nov 11, 2022Updated 3 years ago
- Spark Structured Streaming JDBC Sink☆16Apr 26, 2021Updated 4 years ago
- A pyspark lib to validate data quality☆18Nov 11, 2022Updated 3 years ago
- Terraform plans & commands to provision Azure VMSS and VM from a VM image on demand or from a Jenkins pipeline.☆27Aug 9, 2018Updated 7 years ago
- HDF masterclass materials☆29Mar 28, 2016Updated 9 years ago
- Cloud based Data Platform based on Apache Spark☆27Feb 17, 2026Updated last week
- ☆10Jun 29, 2021Updated 4 years ago
- Spark data pipeline that processes movie ratings data.☆31Feb 5, 2026Updated 3 weeks ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Dec 15, 2024Updated last year
- This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.☆46Jan 27, 2025Updated last year
- Code Samples for my Ververica Webinar "99 Ways to Enrich Streaming Data with Apache Flink"☆41Jan 4, 2022Updated 4 years ago
- ☆10Jan 28, 2025Updated last year
- Python3, NetworkX, Java, MLlib, Spark, Cassandra, Neo4j 3.0, Gephi, Docker☆11Jul 18, 2017Updated 8 years ago
- Collect and aggregate on spark events for profitz☆10Apr 22, 2022Updated 3 years ago
- A timer module for Redis☆11Oct 16, 2019Updated 6 years ago
- 支持分库分表jdbc的flink connector☆10Dec 31, 2021Updated 4 years ago
- An exploration of Flink and change-data-capture via flink-cdc-connectors☆11Jul 7, 2021Updated 4 years ago
- Second generation of the ICGC DCC release ETL built on Spark☆10Apr 8, 2019Updated 6 years ago
- ☆10Aug 13, 2021Updated 4 years ago
- The DBT package to support Snowflake Semantic View as a new materialization.☆50Nov 6, 2025Updated 3 months ago
- Integration of Iceberg table management into Spark SQL☆11Jan 21, 2020Updated 6 years ago
- Code samples, summaries, cheatsheets and other study material for Hadoop MapReduce and Apache Spark☆10Aug 17, 2018Updated 7 years ago
- seckill秒杀项目【PRC】☆10Apr 13, 2019Updated 6 years ago
- ☆10Jul 31, 2019Updated 6 years ago
- This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.☆40Aug 31, 2016Updated 9 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 7 months ago
- A collection of python utility functions☆11Feb 11, 2026Updated 2 weeks ago
- JSON schema to markdown generator☆10Nov 13, 2025Updated 3 months ago
- Framework for simpler Spark Pipelines☆11Feb 22, 2026Updated last week
- Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includ…☆14Dec 25, 2024Updated last year
- Exposes Redis stream through the command line☆12Jun 28, 2022Updated 3 years ago
- SQL for Redis☆11Sep 16, 2022Updated 3 years ago
- A Fully HiveServer2-like Multi-tenancy Spark Thrift Server Supporting Impersonation and Multi-SparkContext with Ranger Authorization (GO …☆10Jul 7, 2022Updated 3 years ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆52Oct 31, 2023Updated 2 years ago