SETL-Framework/setl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SETL-Framework/setl)

SETL-Framework / setl

A simple Spark-powered ETL framework that just works 🍺

☆186

Alternatives and similar repositories for setl

Users that are interested in setl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

YotpoLtd / metorikku
View on GitHub
A simplified, lightweight ETL Framework based on Apache Spark
☆588Jan 24, 2024Updated 2 years ago
sparsecode / DaFlow
View on GitHub
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…
☆26Jun 7, 2021Updated 5 years ago
smart-data-lake / smart-data-lake
View on GitHub
Smart Automation Tool for building modern Data Lakes and Data Pipelines
☆129Updated this week
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
AbsaOSS / hyperdrive
View on GitHub
Extensible streaming ingestion pipeline on top of Apache Spark
☆47Jul 17, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
linkedin / iceberg
View on GitHub
A home for LinkedIn's changes to Apache Iceberg
☆65Updated this week
swoop-inc / spark-alchemy
View on GitHub
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
☆191Oct 15, 2025Updated 9 months ago
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated 3 weeks ago
Pathairush / airflow_hive_spark_sqoop
View on GitHub
A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)
☆12May 2, 2021Updated 5 years ago
ververica / lab-sql-vs-datastream
View on GitHub
Lab project to showcase Flink's performance differences between using a SQL query and implementing the same logic via the DataStream API
☆14Apr 15, 2020Updated 6 years ago
basin-etl / basin
View on GitHub
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from…
☆35Jan 5, 2023Updated 3 years ago
bartosz25 / spark-scala-playground
View on GitHub
Sample processing code using Spark 2.1+ and Scala
☆51Jun 28, 2020Updated 6 years ago
absognety / atomic-scala
View on GitHub
Atomic Scala Book Solutions - for Beginners and first time Functional Programmers
☆12Mar 10, 2020Updated 6 years ago
buildkite / python-pipenv-example
View on GitHub
An example pipeline that tests a Python project using pipenv for dependency management.
☆16Apr 14, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
G-Research / spark-extension
View on GitHub
A library that provides useful extensions to Apache Spark and PySpark.
☆238Jul 1, 2026Updated 3 weeks ago
mrpowers-io / quinn
View on GitHub
pyspark methods to enhance developer productivity 📣 👯 🎉
☆687Jun 9, 2026Updated last month
valdasm / azure-big-data-starter
View on GitHub
A boilerplate project for Azure Big Data PaaS services
☆14Dec 7, 2022Updated 3 years ago
airbnb / sputnik
View on GitHub
☆64Nov 8, 2019Updated 6 years ago
microsoft / Data-Quality-Rule-Engine
View on GitHub
☆24Apr 21, 2023Updated 3 years ago
databrickslabs / dbldatagen
View on GitHub
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used …
☆483Updated this week
avensolutions / spark-sql-etl-framework
View on GitHub
Multi-stage, config driven, SQL based ETL framework using PySpark
☆26Sep 16, 2019Updated 6 years ago
CoxAutomotiveDataSolutions / waimak
View on GitHub
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
☆76Apr 24, 2024Updated 2 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cloudera / dbt-spark-livy
View on GitHub
The dbt-spark-livy adapter allows you to use dbt along with Apache Spark, by connecting via Apache Livy
☆12Mar 30, 2023Updated 3 years ago
largecats / sparksql-formatter
View on GitHub
A SparkSQL formatter based on https://github.com/zeroturnaround/sql-formatter, with customizations and extra features.
☆14Nov 7, 2024Updated last year
webysther / aws-glue-docker
View on GitHub
🐋 Docker image for AWS Glue Spark/Python
☆23Sep 5, 2023Updated 2 years ago
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,636Updated this week
awesome-spark / awesome-spark
View on GitHub
A curated list of awesome Apache Spark packages and resources.
☆1,885Feb 27, 2026Updated 4 months ago
AbsaOSS / cobrix
View on GitHub
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
☆167Jun 22, 2026Updated last month
databrickslabs / tempo
View on GitHub
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins…
☆342Jul 10, 2026Updated last week
avensolutions / cdc-at-scale-using-spark
View on GitHub
Scalable CDC Pattern Implemented using PySpark
☆18Oct 8, 2025Updated 9 months ago
AbsaOSS / spline
View on GitHub
Data Lineage Tracking And Visualization Solution
☆662Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mrpowers-io / spark-fast-tests
View on GitHub
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
☆458Apr 2, 2026Updated 3 months ago
KyloIO / kylo
View on GitHub
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies…
☆22Jan 10, 2019Updated 7 years ago
japila-books / spark-sql-internals
View on GitHub
The Internals of Spark SQL
☆487Jan 25, 2026Updated 5 months ago
homeaway / datapull
View on GitHub
Cloud based Data Platform based on Apache Spark
☆28Jun 30, 2026Updated 3 weeks ago
VyuWing-Learning / Data-Engineering-Bootcamp-Apache-Spark
View on GitHub
☆13Oct 15, 2021Updated 4 years ago
microsoft / hyperspace
View on GitHub
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
☆430Jan 14, 2022Updated 4 years ago
linkedin / transport
View on GitHub
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…
☆306Jun 29, 2026Updated 3 weeks ago