KennethanCeyer / awesome-data-pipeline
Awesome list for datapipeline
β31Updated last year
Alternatives and similar repositories for awesome-data-pipeline:
Users that are interested in awesome-data-pipeline are comparing it to the libraries listed below
- π(GitBook) A curated list of awesome Data Engineering resourcesβ34Updated 2 weeks ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)β49Updated last year
- Apache Spark Guideβ30Updated 2 years ago
- Full stack data engineering tools and infrastructure set-upβ48Updated 3 years ago
- A curated list of awesome blogs, videos, tools and resources about Data Contractsβ171Updated 5 months ago
- Spark data pipeline that processes movie ratings data.β27Updated last week
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.β29Updated last year
- Data lake, data warehouse on GCPβ55Updated 3 years ago
- β11Updated 2 years ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for β¦β133Updated 4 years ago
- System Design, Solution Architecture, Data Systems Practiceβ35Updated last month
- Apache Airflow advanced functionalities examplesβ14Updated 10 months ago
- Open Data Stack Projects: Examples of End to End Data Engineering Projectsβ73Updated last year
- A curated list of open source tools used in analytics platforms and data engineering ecosystemβ195Updated 2 months ago
- A curated list of awesome DataOps toolsβ169Updated 3 months ago
- Code snippets for Data Engineering Design Patterns bookβ53Updated 3 weeks ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake workβ48Updated 2 years ago
- Quick Guides from Dremio on Several topicsβ67Updated 2 weeks ago
- A modern ELT demo using airbyte, dbt, snowflake and dagsterβ26Updated 2 years ago
- A guide for leading a data (engineering) teamβ62Updated 8 months ago
- End-to-end data platform leveraging the Modern data stackβ45Updated 9 months ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatioβ¦β53Updated last year
- New generation opensource data stackβ65Updated 2 years ago
- A portable Datamart and Business Intelligence suite built with Docker, Airflow, dbt, PostgreSQL and Supersetβ38Updated 2 months ago
- Public source code for the Batch Processing with Apache Beam (Python) online courseβ18Updated 4 years ago
- build dw with dbtβ35Updated 3 months ago
- Code for dbt tutorialβ149Updated 8 months ago
- A curated list of awesome Databricks resources, including Sparkβ16Updated 7 months ago
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMRβ65Updated 3 years ago
- Cloned by the `dbt init` taskβ60Updated 9 months ago