sspaeti / data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
☆10Updated 4 months ago
Alternatives and similar repositories for data-engineer-handbook:
Users that are interested in data-engineer-handbook are comparing it to the libraries listed below
- Full stack data engineering tools and infrastructure set-up☆51Updated 4 years ago
- Code for my "Efficient Data Processing in SQL" book.☆56Updated 8 months ago
- Some example projects for Data Engineers to build, end-to-end.☆28Updated last year
- A modern ELT demo using airbyte, dbt, snowflake and dagster☆27Updated 2 years ago
- Code to demonstrate data engineering metadata & logging best practices☆16Updated last year
- Code snippets for Data Engineering Design Patterns book☆80Updated last month
- Example repo to create end to end tests for data pipeline.☆23Updated 10 months ago
- Cost Efficient Data Pipelines with DuckDB☆52Updated 8 months ago
- ☆16Updated last year
- Code for dbt tutorial☆156Updated 10 months ago
- Open Data Stack Projects: Examples of End to End Data Engineering Projects☆82Updated last year
- Code for blog at: https://www.startdataengineering.com/post/docker-for-de/☆36Updated 11 months ago
- A custom end-to-end analytics platform for customer churn☆11Updated 3 months ago
- Apache Airflow advanced functionalities examples☆17Updated last year
- Awesome list for datapipeline☆34Updated 2 years ago
- Simple stream processing pipeline☆101Updated 10 months ago
- ☆36Updated last month
- End to end data engineering project☆54Updated 2 years ago
- This is a demo streaming project simulating a music streaming service.☆35Updated 8 months ago
- Analytics Engineering best practices and standards used at Hiflylabs☆12Updated last month
- Local development environment for python data projects, with Docker☆23Updated 2 years ago
- This is a real-life, high throughput streaming ELT data pipeline for ecommerce☆13Updated last year
- A repository of sample code to show data quality checking best practices using Airflow.☆76Updated 2 years ago
- Pyspark boilerplate for running prod ready data pipeline☆28Updated 4 years ago
- ☆17Updated 8 months ago
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆32Updated last year
- Jinja cheatsheet for dbt development☆38Updated 2 years ago
- (project & tutorial) dag pipeline tests + ci/cd setup☆87Updated 4 years ago
- A "modern" Strava data pipeline fueled by dlt, duckdb, dbt, and evidence.dev☆32Updated 3 months ago
- reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.☆14Updated last year