pracdata / awesome-open-source-data-engineering
A curated list of open source tools used in analytics platforms and data engineering ecosystem
☆149Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for awesome-open-source-data-engineering
- A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects☆196Updated this week
- Quickstart for any service☆132Updated this week
- A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB, PostgreSQL and Superset☆183Updated 2 weeks ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆57Updated last year
- Open Data Stack Projects: Examples of End to End Data Engineering Projects☆71Updated last year
- A curated list of awesome blogs, videos, tools and resources about Data Contracts☆166Updated 3 months ago
- ☆104Updated 3 months ago
- Possibly the fastest DataFrame-agnostic quality check library in town.☆174Updated this week
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆62Updated 2 months ago
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆28Updated 8 months ago
- ☆66Updated last month
- ☆193Updated 3 weeks ago
- Quick Guides from Dremio on Several topics☆65Updated 3 weeks ago
- Schema modelling framework for decentralised domain-driven ownership of data.☆248Updated 11 months ago
- Code for "Efficient Data Processing in Spark" Course☆247Updated last month
- New generation opensource data stack☆61Updated 2 years ago
- ☆138Updated this week
- Turning PySpark Into a Universal DataFrame API☆324Updated this week
- The smallest DuckDB SQL orchestrator on Earth.☆178Updated 2 months ago
- Step-by-step tutorial on building a Kimball dimensional model with dbt☆113Updated 4 months ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆224Updated 3 weeks ago
- The Open-Source Enterprise Data Platform in a single Portal☆218Updated this week
- A write-audit-publish implementation on a data lake without the JVM☆41Updated 3 months ago
- Demo DAGs that show how to run dbt Core in Airflow using Cosmos☆46Updated last month
- Code for dbt tutorial☆143Updated 5 months ago
- Data product portal created by Dataminded☆148Updated this week
- A curated list of awesome DataOps tools☆158Updated last month
- Dagster Labs' open-source data platform, built with Dagster.☆286Updated this week
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observ…☆112Updated last week