Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
☆97Feb 28, 2026Updated this week
Alternatives and similar repositories for flowman
Users that are interested in flowman are comparing it to the libraries listed below
Sorting:
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs …☆160Dec 10, 2022Updated 3 years ago
- ☆10May 16, 2022Updated 3 years ago
- A library enabling DAG structuring of data processing programs such as ETLs☆17Dec 13, 2025Updated 2 months ago
- Horizon Exchange REST API Server☆11Jan 21, 2026Updated last month
- ☆12Jul 10, 2022Updated 3 years ago
- A Python Library to support running data quality rules while the spark job is running⚡☆200Updated this week
- Code snippets used in demos recorded for the blog.☆38Feb 17, 2026Updated 2 weeks ago
- Capture, save, and analyze AWS Redshift performance metrics☆17Oct 6, 2017Updated 8 years ago
- Very large scale k-mer counting and analysis on Apache Spark.☆18Feb 22, 2026Updated last week
- Code examples for the Introduction to Kubeflow course☆14Jan 12, 2021Updated 5 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- Write SQL in Scala☆30Nov 25, 2025Updated 3 months ago
- A simplified, lightweight ETL Framework based on Apache Spark☆587Jan 24, 2024Updated 2 years ago
- This repo contains samples for EMR Studio feature.☆21Nov 15, 2022Updated 3 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,588Feb 17, 2026Updated 2 weeks ago
- Observability Python library - Powered by Kensu☆22Oct 15, 2024Updated last year
- A Apache Hive SerDe (short for serializer/deserializer) for the Ion file format.☆31Mar 27, 2025Updated 11 months ago
- Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and…☆28Mar 20, 2023Updated 2 years ago
- ☆22Jan 23, 2023Updated 3 years ago
- Data Lineage Tracking And Visualization Solution☆656Updated this week
- Find out which countries have won the most medals and how the participation of nations has changed over time, with R☆10Aug 22, 2021Updated 4 years ago
- Better bridge apache spark and postgresql☆23Sep 11, 2023Updated 2 years ago
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆347May 31, 2024Updated last year
- Hopsworks - Data-Intensive AI platform with a Feature Store☆1,286Feb 10, 2025Updated last year
- Druid service descriptor and parcel for Cloudera CDH5☆31Sep 3, 2019Updated 6 years ago
- ☆10May 25, 2021Updated 4 years ago
- Operator for Apache Superset for Stackable Data Platform☆35Updated this week
- ☆43Feb 20, 2016Updated 10 years ago
- Scala framework for collecting performance metrics and conducting sound experimental benchmarking.☆13Nov 19, 2025Updated 3 months ago
- Simple chatbot created using Rasa☆10Feb 20, 2021Updated 5 years ago
- Trino load balancer with support for routing, queueing and auto-scaling☆37Feb 17, 2026Updated 2 weeks ago
- Color detection beginner data science project☆13Dec 6, 2020Updated 5 years ago
- It consists of all code examples discussed as part of deep learning course taken at algorithmica☆11Oct 1, 2020Updated 5 years ago
- Multi-hop declarative data pipelines☆124Feb 25, 2026Updated last week
- WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging …☆31Oct 28, 2025Updated 4 months ago
- Export Airflow metrics (from mysql) in prometheus format☆29Apr 15, 2025Updated 10 months ago
- ☆33Mar 12, 2017Updated 8 years ago
- Github bot for keeping your Bazel dependencies up-to-date and clean☆27Mar 20, 2020Updated 5 years ago
- Hands-on tutorial on adversarial examples 😈. With Streamlit app ❤️.☆31Jun 17, 2022Updated 3 years ago