dimajix / flowmanLinks
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
โ97Updated last week
Alternatives and similar repositories for flowman
Users that are interested in flowman are comparing it to the libraries listed below
Sorting:
- Smart Automation Tool for building modern Data Lakes and Data Pipelinesโ122Updated this week
- A simple Spark-powered ETL framework that just works ๐บโ182Updated 3 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0โ103Updated 3 years ago
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!โ235Updated last year
- A library that provides useful extensions to Apache Spark and PySpark.โ232Updated last week
- REST API for Apache Spark on K8S or YARNโ108Updated last month
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs โฆโ160Updated 3 years ago
- Use SQL to build ELT pipelines on a data lakehouse.โ288Updated 3 years ago
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.โ346Updated last year
- A simplified, lightweight ETL Framework based on Apache Sparkโ588Updated 2 years ago
- The Internals of Spark on Kubernetesโ72Updated 3 years ago
- Snowflake Data Source for Apache Spark.โ230Updated 2 weeks ago
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.โ180Updated this week
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.โ30Updated last week
- Generate and Visualize Data Lineage from query historyโ327Updated 2 years ago
- DataQuality for BigDataโ147Updated 2 years ago
- The Internals of Delta Lakeโ187Updated 2 months ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL databaseโ76Updated 4 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Aโฆโ132Updated 3 weeks ago
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....โ80Updated this week
- Tool to automate data quality checks on data pipelinesโ257Updated 3 years ago
- Adapter for dbt that executes dbt pipelines on Apache Flinkโ96Updated last year
- The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)โ253Updated 2 weeks ago
- Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful โฆโ144Updated last year
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframesโ63Updated 3 years ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.โ76Updated last year
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lakeโ296Updated this week
- A library that brings useful functions from various modern database management systems to Apache Sparkโ61Updated 2 years ago
- โ63Updated 6 years ago
- The Workload Analyzer collects Prestoยฎ and Trino workload statistics, and analyzes themโ136Updated 2 years ago