dimajix / flowman
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
β93Updated this week
Alternatives and similar repositories for flowman:
Users that are interested in flowman are comparing it to the libraries listed below
- A simple Spark-powered ETL framework that just works πΊβ178Updated last year
- Smart Automation Tool for building modern Data Lakes and Data Pipelinesβ114Updated this week
- Example for article Running Spark 3 with standalone Hive Metastore 3.0β97Updated last year
- A library that provides useful extensions to Apache Spark and PySpark.β205Updated last month
- A Python Library to support running data quality rules while the spark job is runningβ‘β167Updated last week
- A Table format agnostic data sharing frameworkβ38Updated 11 months ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Aβ¦β118Updated last month
- REST API for Apache Spark on K8S or YARNβ93Updated this week
- Snowflake Data Source for Apache Spark.β222Updated last month
- A library that brings useful functions from various modern database management systems to Apache Sparkβ58Updated last year
- The Internals of Delta Lakeβ183Updated this week
- The Internals of Spark on Kubernetesβ70Updated 2 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframesβ63Updated 2 years ago
- Apache Hive Metastore as a Standalone server in Dockerβ67Updated 4 months ago
- Spline agent for Apache Sparkβ191Updated last week
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.β70Updated this week
- Extensible streaming ingestion pipeline on top of Apache Sparkβ44Updated 9 months ago
- Delta reader for the Ray open-source toolkit for building ML applicationsβ43Updated 11 months ago
- The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)β219Updated 3 weeks ago
- CLI tool to bulk migrate the tables from one catalog another without a data copyβ70Updated this week
- Adapter for dbt that executes dbt pipelines on Apache Flinkβ88Updated 9 months ago
- β79Updated last year
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL databaseβ71Updated 3 years ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)β62Updated last month
- Flowchart for debugging Spark applicationsβ104Updated 3 months ago
- β63Updated 5 years ago