A simple Spark-powered ETL framework that just works 🍺
☆185Oct 2, 2025Updated 9 months ago
Alternatives and similar repositories for setl
Users that are interested in setl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A simplified, lightweight ETL Framework based on Apache Spark☆588Jan 24, 2024Updated 2 years ago
- Generate fake data for Scala and Spark☆15Dec 19, 2025Updated 6 months ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Jun 7, 2021Updated 5 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆126Updated this week
- Qubole Sparklens tool for performance tuning Apache Spark☆591Jun 26, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Extensible streaming ingestion pipeline on top of Apache Spark☆47Jul 17, 2025Updated 11 months ago
- An ETL framework in Scala for Data Engineers☆23Aug 30, 2022Updated 3 years ago
- A home for LinkedIn's changes to Apache Iceberg☆66Updated this week
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆191Oct 15, 2025Updated 8 months ago
- A library enabling DAG structuring of data processing programs such as ETLs☆17Apr 13, 2026Updated 2 months ago
- Essential Spark extensions and helper methods ✨😲☆767Jun 22, 2026Updated last week
- A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)☆12May 2, 2021Updated 5 years ago
- Apache Spark based ETL Engine☆71Oct 18, 2016Updated 9 years ago
- Lab project to showcase Flink's performance differences between using a SQL query and implementing the same logic via the DataStream API☆14Apr 15, 2020Updated 6 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from…☆35Jan 5, 2023Updated 3 years ago
- Sample processing code using Spark 2.1+ and Scala☆51Jun 28, 2020Updated 6 years ago
- An example pipeline that tests a Python project using pipenv for dependency management.☆16Apr 14, 2026Updated 2 months ago
- A library that provides useful extensions to Apache Spark and PySpark.☆238Jun 5, 2026Updated 3 weeks ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆686Jun 9, 2026Updated 3 weeks ago
- A boilerplate project for Azure Big Data PaaS services☆14Dec 7, 2022Updated 3 years ago
- ☆64Nov 8, 2019Updated 6 years ago
- ☆24Apr 21, 2023Updated 3 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Apr 24, 2024Updated 2 years ago
- The dbt-spark-livy adapter allows you to use dbt along with Apache Spark, by connecting via Apache Livy☆12Mar 30, 2023Updated 3 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30May 13, 2026Updated last month
- Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines☆17Jan 21, 2020Updated 6 years ago
- A SparkSQL formatter based on https://github.com/zeroturnaround/sql-formatter, with customizations and extra features.☆14Nov 7, 2024Updated last year
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,625Updated this week
- 🐋 Docker image for AWS Glue Spark/Python☆23Sep 5, 2023Updated 2 years ago
- A COBOL parser and Mainframe/EBCDIC data source for Apache Spark☆167Jun 22, 2026Updated last week
- Atomic Scala Book Solutions - for Beginners and first time Functional Programmers☆12Mar 10, 2020Updated 6 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A curated list of awesome Apache Spark packages and resources.☆1,882Feb 27, 2026Updated 4 months ago
- Scalable CDC Pattern Implemented using PySpark☆18Oct 8, 2025Updated 8 months ago
- Data Lineage Tracking And Visualization Solution☆661Jun 22, 2026Updated last week
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆458Apr 2, 2026Updated 3 months ago
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆306Oct 30, 2025Updated 8 months ago
- The Internals of Spark SQL☆487Jan 25, 2026Updated 5 months ago
- A library to mutate parquet files☆20May 9, 2023Updated 3 years ago