dimajix / flowmanLinks
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
โ97Updated this week
Alternatives and similar repositories for flowman
Users that are interested in flowman are comparing it to the libraries listed below
Sorting:
- A simple Spark-powered ETL framework that just works ๐บโ181Updated 3 months ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelinesโ123Updated this week
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!โ235Updated 11 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0โ103Updated 2 years ago
- A library that provides useful extensions to Apache Spark and PySpark.โ232Updated 3 weeks ago
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs โฆโ160Updated 3 years ago
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.โ346Updated last year
- A simplified, lightweight ETL Framework based on Apache Sparkโ588Updated last year
- REST API for Apache Spark on K8S or YARNโ109Updated last month
- Use SQL to build ELT pipelines on a data lakehouse.โ288Updated 3 years ago
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....โ78Updated last week
- DataQuality for BigDataโ146Updated 2 years ago
- โ81Updated 8 months ago
- Apache DataLab (incubating)โ153Updated 2 years ago
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.โ177Updated this week
- Generate and Visualize Data Lineage from query historyโ327Updated 2 years ago
- A library that brings useful functions from various modern database management systems to Apache Sparkโ61Updated 2 years ago
- The Internals of Spark on Kubernetesโ72Updated 3 years ago
- The Workload Analyzer collects Prestoยฎ and Trino workload statistics, and analyzes themโ136Updated 2 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.โ30Updated this week
- DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.โ267Updated 9 months ago
- Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful โฆโ145Updated last year
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Aโฆโ131Updated this week
- The Internals of Delta Lakeโ187Updated last month
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.โ76Updated last year
- Spline agent for Apache Sparkโ201Updated last month
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lakeโ292Updated last week
- Repository of helm charts for deploying DataHub on a Kubernetes clusterโ201Updated last week
- A Table format agnostic data sharing frameworkโ42Updated last year
- Adapter for dbt that executes dbt pipelines on Apache Flinkโ97Updated last year