A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
☆23Nov 19, 2024Updated last year
Alternatives and similar repositories for Youtube-Recommend-Master-ETL-Pipeline
Users that are interested in Youtube-Recommend-Master-ETL-Pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆44Apr 22, 2023Updated 3 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Aug 12, 2025Updated 9 months ago
- NoSQL extract, transform, load (ETL) toolkit with Python☆16May 9, 2026Updated 3 weeks ago
- ☆27Jan 21, 2026Updated 4 months ago
- ☆30Feb 11, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A simple demo showing how to use Ably and fastAPI to route messages into Kafka for stream processing☆16Oct 12, 2021Updated 4 years ago
- DataTalks.Club's Data Engineering Zoomcamp Project☆24Jul 14, 2022Updated 3 years ago
- Data Guy Story commandline☆11Dec 2, 2022Updated 3 years ago
- an end-to-end data pipeline extracting music listening habits and producing an insightful dashboard☆18Mar 31, 2024Updated 2 years ago
- Spark Structured Streaming data pipeline that processes movie ratings data in real-time.☆14Apr 15, 2026Updated last month
- My Setup Development Environment as Data Engineer☆40Aug 5, 2025Updated 9 months ago
- Create agents in PHP that monitor and act on your behalf. A Laravel based Huginn port.☆13Jan 4, 2023Updated 3 years ago
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆29Jun 7, 2023Updated 2 years ago
- Source code for 'Pro Power BI Desktop' by Adam Aspin☆13Mar 28, 2017Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Cool DE Projects☆73Mar 22, 2026Updated 2 months ago
- Fivetran's Jira source dbt package☆14Oct 1, 2025Updated 7 months ago
- An example of a Dagster project with a possible folder structure to organize the assets, jobs, repositories, schedules, and ops. Also has…☆102Nov 3, 2024Updated last year
- 🚀 Complete AWS learning path for beginners - 45K+ community resource with hands-on labs, workshops, and certification guides☆18Apr 28, 2026Updated last month
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 5 years ago
- ELT for AEMET weather data.☆16Mar 23, 2025Updated last year
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- ☆11Nov 18, 2022Updated 3 years ago
- This provider contains operators, decorators and triggers to send a ray job from an airflow task☆25Oct 27, 2025Updated 7 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Docktor is a Web App that deploys an easy-to-use kit of analysis and scanning tools.☆14Nov 1, 2023Updated 2 years ago
- ☆16Mar 15, 2024Updated 2 years ago
- End to end data pipeline to extract and analyze submissions from any subreddit using Pushshift, python, dbt and BigQuery.☆12Jul 17, 2023Updated 2 years ago
- A Python Snowpark CLI for loading the TPC-DI dataset into Snowflake. Additional dbt models for building the data warehouse.☆11Sep 4, 2025Updated 8 months ago
- Source code for 'Power Query for Power BI and Excel' by Christopher Webb and Crossjoin Consulting Limited☆19Aug 18, 2017Updated 8 years ago
- Pipeline that extracts data from the Spotify API to build a more detailed version of Spotify Wrapped☆49Mar 13, 2026Updated 2 months ago
- Repo for learning DBT with Snowflake, featuring projects and models for data transformation and automation☆26Mar 31, 2025Updated last year
- Skooldio: Data Pipelines with Airflow☆23May 24, 2025Updated last year
- ☆11Dec 28, 2020Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆14May 1, 2024Updated 2 years ago
- Data Engineering with AWS Cookbook, published by Packt☆26Apr 13, 2026Updated last month
- Analytics engineering with dbt - projects and developer environment☆22Sep 27, 2024Updated last year
- Code to demonstrate data engineering metadata & logging best practices☆21Mar 12, 2024Updated 2 years ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆67Sep 23, 2023Updated 2 years ago
- Welcome to my data engineering projects repository! Here you will find a collection of data engineering projects that I have worked on.☆24Apr 27, 2023Updated 3 years ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆78Sep 2, 2023Updated 2 years ago