dimajix / flowmanLinks
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
β97Updated last week
Alternatives and similar repositories for flowman
Users that are interested in flowman are comparing it to the libraries listed below
Sorting:
- A simple Spark-powered ETL framework that just works πΊβ181Updated 2 months ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelinesβ123Updated 2 weeks ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0β102Updated 2 years ago
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!β235Updated 10 months ago
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs β¦β159Updated 3 years ago
- A library that provides useful extensions to Apache Spark and PySpark.β231Updated last week
- The Internals of Spark on Kubernetesβ72Updated 3 years ago
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.β346Updated last year
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....β77Updated last week
- REST API for Apache Spark on K8S or YARNβ109Updated 2 weeks ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.β30Updated this week
- Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful β¦β145Updated last year
- Sparglimβ¨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!β41Updated 2 weeks ago
- A library that brings useful functions from various modern database management systems to Apache Sparkβ61Updated 2 years ago
- Multiple node presto cluster on docker containerβ126Updated 3 years ago
- Use SQL to build ELT pipelines on a data lakehouse.β288Updated 3 years ago
- Spline agent for Apache Sparkβ200Updated 2 weeks ago
- A simplified, lightweight ETL Framework based on Apache Sparkβ586Updated last year
- Apache DataLab (incubating)β153Updated 2 years ago
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lakeβ291Updated 2 weeks ago
- Generate and Visualize Data Lineage from query historyβ327Updated 2 years ago
- A Table format agnostic data sharing frameworkβ42Updated last year
- β40Updated 2 years ago
- Adapter for dbt that executes dbt pipelines on Apache Flinkβ96Updated last year
- Snowflake Data Source for Apache Spark.β230Updated this week
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.β176Updated this week
- Yet Another (Spark) ETL Frameworkβ21Updated 2 years ago
- The Internals of Delta Lakeβ187Updated 3 weeks ago
- Repository of helm charts for deploying DataHub on a Kubernetes clusterβ196Updated last week
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Aβ¦β130Updated 2 weeks ago