avensolutions/spark-sql-etl-framework

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/avensolutions/spark-sql-etl-framework)

avensolutions / spark-sql-etl-framework

Multi-stage, config driven, SQL based ETL framework using PySpark

☆26

Alternatives and similar repositories for spark-sql-etl-framework

Users that are interested in spark-sql-etl-framework are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

avensolutions / cdc-at-scale-using-spark
View on GitHub
Scalable CDC Pattern Implemented using PySpark
☆18Oct 8, 2025Updated 9 months ago
NAVEENKUMARMURUGAN / Pyspark-ETL-Framework
View on GitHub
☆16Apr 9, 2019Updated 7 years ago
yennanliu / spark-etl-pipeline
View on GitHub
Various data stream/batch process demo with Apache Scala Spark 🚀
☆12Feb 28, 2020Updated 6 years ago
konrads / spark-etl
View on GitHub
Set of ETL utils for Spark
☆15May 4, 2020Updated 6 years ago
basin-etl / basin
View on GitHub
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from…
☆35Jan 5, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
guidok91 / spark-movies-etl
View on GitHub
Spark data pipeline that processes movie ratings data.
☆31Jul 12, 2026Updated last week
mshtelma / spark-structured-streaming-jdbc-sink
View on GitHub
Spark Structured Streaming JDBC Sink
☆16Apr 26, 2021Updated 5 years ago
jasonsatran / spark-meta
View on GitHub
Spark data profiling utilities
☆23Nov 24, 2018Updated 7 years ago
databricks-demos / dbt-databricks-c360
View on GitHub
Demo running DBT as a Databricks Workflow task
☆13Nov 13, 2024Updated last year
Hamza88-coder / Real-Time-Recruitment-System-with-AI-and-Data-Analytics
View on GitHub
Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includ…
☆14Dec 25, 2024Updated last year
bomeng / Heracles
View on GitHub
High performance HBase / Spark SQL engine
☆28Jul 7, 2022Updated 4 years ago
simonellistonball / masterclass-hdf
View on GitHub
HDF masterclass materials
☆29Mar 28, 2016Updated 10 years ago
vim89 / datapipelines-essentials-python
View on GitHub
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…
☆56May 6, 2023Updated 3 years ago
homeaway / datapull
View on GitHub
Cloud based Data Platform based on Apache Spark
☆28Jun 30, 2026Updated 3 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Azure / terraform-with-jenkins-samples
View on GitHub
Terraform plans & commands to provision Azure VMSS and VM from a VM image on demand or from a Jenkins pipeline.
☆26Aug 9, 2018Updated 7 years ago
d-e-n-t-y / pg_fdw_mv_rewrite
View on GitHub
☆10Jul 31, 2019Updated 6 years ago
aws-samples / dbtgluenyctaxidemo
View on GitHub
☆11Oct 11, 2022Updated 3 years ago
tupol / spark-utils
View on GitHub
Basic framework utilities to quickly start writing production ready Apache Spark applications
☆36Dec 15, 2024Updated last year
aws-samples / amazon-emr-optimize-data-processing
View on GitHub
Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark
☆14Apr 14, 2023Updated 3 years ago
randerzander / HiveToPhoenix
View on GitHub
An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase
☆14Mar 23, 2016Updated 10 years ago
hammerlab / spark-util
View on GitHub
low-level helpers for Apache Spark libraries and tests
☆16Dec 29, 2018Updated 7 years ago
snowflakedb / snowflake-rest-api-specs
View on GitHub
Public rest api specs for Snowflake
☆24Jul 8, 2026Updated last week
aws-samples / aws-codeguru-profiler-python-demo-application
View on GitHub
☆12Apr 17, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
aws-samples / spark-streaming-sql-s3-connector
View on GitHub
An Apache Spark Structured Streaming S3 connector for reading S3 files using Amazon S3 event notifications to AWS SQS
☆16Feb 13, 2024Updated 2 years ago
AbePabbathi / lakehouse-tacklebox
View on GitHub
This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.
☆45Jan 27, 2025Updated last year
Snowflake-Labs / spcs-templates
View on GitHub
☆16Jun 30, 2026Updated 2 weeks ago
AbsaOSS / hyperdrive
View on GitHub
Extensible streaming ingestion pipeline on top of Apache Spark
☆47Jul 17, 2025Updated last year
knaufk / enrichments-with-flink
View on GitHub
Code Samples for my Ververica Webinar "99 Ways to Enrich Streaming Data with Apache Flink"
☆41Jan 4, 2022Updated 4 years ago
databricks-industry-solutions / json2spark-schema
View on GitHub
Converting a json schema to a spark schema (struct) representation
☆14Mar 18, 2025Updated last year
Oracen / dbtvault-generator
View on GitHub
Generate DBT Vault files from yml metadata!
☆20Jul 27, 2023Updated 2 years ago
tmalaska / CopybookInputFormat
View on GitHub
Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...
☆19Dec 7, 2017Updated 8 years ago
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
aws-samples / intelligent-rag-bedrockagent-iac
View on GitHub
This repository will provde code to build end-to-end IAC code to build an intelligent GenAI chatbot based on Amazon Bedrock
☆12Jun 13, 2025Updated last year
aljoscha / blog
View on GitHub
Thoughts on things I find interesting.
☆17Dec 19, 2024Updated last year
twosigma / postgresql-contrib
View on GitHub
☆13Jun 7, 2018Updated 8 years ago
shwethags / atlas-lineage
View on GitHub
Example to create lineage in Atlas with sqoop and spark
☆14Apr 5, 2017Updated 9 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago
sourcegraph / phabricator-extension
View on GitHub
Get code intelligence on Phabricator
☆15Jul 3, 2026Updated 2 weeks ago
mahapatra09 / aflux
View on GitHub
☆10Dec 16, 2022Updated 3 years ago