dimajix / flowmanLinks
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
β96Updated last week
Alternatives and similar repositories for flowman
Users that are interested in flowman are comparing it to the libraries listed below
Sorting:
- A simple Spark-powered ETL framework that just works πΊβ182Updated 3 weeks ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelinesβ124Updated last week
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs β¦β158Updated 2 years ago
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!β233Updated 6 months ago
- REST API for Apache Spark on K8S or YARNβ99Updated 2 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0β100Updated 2 years ago
- A library that provides useful extensions to Apache Spark and PySpark.β229Updated last month
- β80Updated 4 months ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.β29Updated last week
- A simplified, lightweight ETL Framework based on Apache Sparkβ589Updated last year
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....β76Updated this week
- Repository of helm charts for deploying DataHub on a Kubernetes clusterβ193Updated this week
- The Internals of Spark on Kubernetesβ71Updated 3 years ago
- Generate and Visualize Data Lineage from query historyβ326Updated 2 years ago
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.β344Updated last year
- DataQuality for BigDataβ144Updated last year
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lakeβ279Updated this week
- Snowflake Data Source for Apache Spark.β229Updated this week
- A Table format agnostic data sharing frameworkβ38Updated last year
- Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful β¦β144Updated last year
- DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.β266Updated 4 months ago
- The Internals of Delta Lakeβ184Updated 7 months ago
- Apache DataLab (incubating)β152Updated last year
- Use SQL to build ELT pipelines on a data lakehouse.β288Updated 3 years ago
- A library that brings useful functions from various modern database management systems to Apache Sparkβ60Updated last year
- Drop-in replacement for Apache Spark UIβ293Updated last week
- β40Updated 2 years ago
- A Python Library to support running data quality rules while the spark job is runningβ‘β189Updated 2 weeks ago
- Adapter for dbt that executes dbt pipelines on Apache Flinkβ95Updated last year
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.β76Updated last year