KennethanCeyer / awesome-data-pipeline
Awesome list for datapipeline
β28Updated last year
Related projects β
Alternatives and complementary repositories for awesome-data-pipeline
- New generation opensource data stackβ61Updated 2 years ago
- π(GitBook) A curated list of awesome Data Engineering resourcesβ34Updated 3 weeks ago
- A curated list of awesome blogs, videos, tools and resources about Data Contractsβ166Updated 3 months ago
- Code snippets for Data Engineering Design Patterns bookβ40Updated last week
- A curated list of awesome DataOps toolsβ158Updated last month
- Spark data pipeline that processes movie ratings data.β27Updated last week
- Design/Implement stream/batch architecture on NYC taxi data | #DEβ26Updated 3 years ago
- β15Updated 3 months ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)β44Updated last year
- Quick Guides from Dremio on Several topicsβ65Updated 3 weeks ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake workβ47Updated 2 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data piβ¦β92Updated 3 weeks ago
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observβ¦β111Updated last week
- A Snowflake GPT Demo using SqlAlchemyβ23Updated last year
- A curated list of dagster code snippets for data engineersβ52Updated 8 months ago
- Docker Airflow - Contains a docker compose file for Airflow 2.0β59Updated 2 years ago
- Full stack data engineering tools and infrastructure set-upβ44Updated 3 years ago
- Spark Structured Streaming data pipeline that processes movie ratings data in real-time.β13Updated last week
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in β¦β21Updated 2 years ago
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....β72Updated this week
- A repository of sample code to show data quality checking best practices using Airflow.β72Updated last year
- β32Updated 8 months ago
- Curated list of resources about Apache Airflowβ19Updated 3 years ago
- A curated list of awesome Databricks resources, including Sparkβ14Updated 4 months ago
- Data Tools Subjective Listβ80Updated last year
- Sample configuration to deploy a modern data platform.β86Updated 2 years ago
- Make dbt great again! Enables end user to extend dbt to his/her needsβ13Updated this week
- Open Data Stack Projects: Examples of End to End Data Engineering Projectsβ71Updated last year
- This is a Vs Code extension for Apache Airflowβ20Updated 11 months ago
- A Python package that creates fine-grained dbt tasks on Apache Airflowβ62Updated last month