KennethanCeyer / awesome-data-pipeline
Awesome list for datapipeline
β34Updated 2 years ago
Alternatives and similar repositories for awesome-data-pipeline:
Users that are interested in awesome-data-pipeline are comparing it to the libraries listed below
- π(GitBook) A curated list of awesome Data Engineering resourcesβ35Updated 3 weeks ago
- Full stack data engineering tools and infrastructure set-upβ51Updated 4 years ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)β56Updated last year
- A curated list of awesome Databricks resources, including Sparkβ17Updated 9 months ago
- Code snippets for Data Engineering Design Patterns bookβ80Updated last month
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake workβ47Updated 2 years ago
- A modern ELT demo using airbyte, dbt, snowflake and dagsterβ27Updated 2 years ago
- A curated list of awesome DataOps toolsβ185Updated 6 months ago
- Spark data pipeline that processes movie ratings data.β28Updated 3 weeks ago
- Apache Spark Guideβ31Updated 3 years ago
- Curated list of resources about Apache Airflowβ19Updated 4 years ago
- Delta Lake Documentationβ49Updated 10 months ago
- A repository of sample code to show data quality checking best practices using Airflow.β76Updated 2 years ago
- β11Updated 5 months ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architectureβ60Updated 3 months ago
- A simple Data Engineering solution for testing or education purposes. You only need to know SQL and Python to understand this project. Daβ¦β25Updated 2 years ago
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize aβ¦β25Updated last year
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMRβ66Updated 3 years ago
- β17Updated 8 months ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data πβ32Updated 3 years ago
- Sample code to collect Apache Iceberg metrics for table monitoringβ26Updated 8 months ago
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarioβ¦β23Updated 5 months ago
- Finance π¦ Data Builder π οΈ @ postgres πβ21Updated 4 years ago
- β12Updated 3 years ago
- A CLI tool to streamline getting started with Apache Airflowβ’ and managing multiple Airflow projectsβ219Updated this week
- Data Tools Subjective Listβ83Updated last year
- DataHub on AWS demonstration resourcesβ10Updated 2 years ago
- π Run, schedule, and manage your dbt jobs using Kubernetes.β24Updated 6 years ago
- β46Updated last week
- A portable Datamart and Business Intelligence suite built with Docker, Airflow, dbt, PostgreSQL and Supersetβ40Updated 5 months ago