abeltavares / batch-data-pipelineLinks
π¦ Batch data pipeline with Airflow, DuckDB, Delta Lake, Trino, MinIO, and Metabase. Full observability and data quality.
β80Updated last month
Alternatives and similar repositories for batch-data-pipeline
Users that are interested in batch-data-pipeline are comparing it to the libraries listed below
Sorting:
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testingβ280Updated last year
- Local Environment to Practice Data Engineeringβ143Updated 11 months ago
- Sample project to demonstrate data engineering best practicesβ202Updated last year
- End to end data engineering projectβ57Updated 3 years ago
- A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.β251Updated last year
- End to end data engineering project with kafka, airflow, spark, postgres and docker.β107Updated 8 months ago
- Code for "Efficient Data Processing in Spark" Courseβ349Updated last month
- Code for "Advanced data transformations in SQL" free live workshopβ88Updated 7 months ago
- My notes of the Data Engineering Zoomcamp by DataTalksClubβ37Updated 2 years ago
- β144Updated 2 years ago
- This is the repo of the Weather app from my YouTube videoβ20Updated 2 years ago
- Sample repo for startdataengineering DE 101 free courseβ71Updated last year
- This is a template you can use for your next data engineering portfolio project.β183Updated 4 years ago
- β15Updated last year
- Data pipeline that scrapes Rust cheater Steam profilesβ54Updated 3 years ago
- In this repository we store all materials for dlt workshops, courses, etc.β242Updated this week
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Dockeβ¦β20Updated 4 months ago
- Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflakeβ245Updated 5 months ago
- β120Updated 4 months ago
- Nyc_Taxi_Data_Pipeline - DE Projectβ129Updated last year
- Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and moreβ370Updated 2 years ago
- Code for blog at https://www.startdataengineering.com/post/python-for-de/β92Updated last year
- Practical Data Engineering: A Hands-On Real-Estate Project Guideβ724Updated last year
- Repository for Data Engineering Zoomcamp 2024β14Updated last year
- β162Updated 3 years ago
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,β¦β47Updated last year
- This repository will contain all of the resources for the Mage component of the Data Engineering Zoomcamp: https://github.com/DataTalksClβ¦β101Updated last year
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.β42Updated 2 years ago
- Near real time ETL to populate a dashboard.β73Updated 3 months ago
- Writes the CSV file to Postgres, read table and modify it. Write more tables to Postgres with Airflow.β37Updated 2 years ago