A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
☆24Nov 19, 2024Updated last year
Alternatives and similar repositories for Youtube-Recommend-Master-ETL-Pipeline
Users that are interested in Youtube-Recommend-Master-ETL-Pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆44Apr 22, 2023Updated 3 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆21Aug 12, 2025Updated 10 months ago
- NoSQL extract, transform, load (ETL) toolkit with Python☆16Jun 11, 2026Updated last week
- End-to-end ELT data engineering project☆23Dec 24, 2022Updated 3 years ago
- ☆30Feb 11, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A simple demo showing how to use Ably and fastAPI to route messages into Kafka for stream processing☆16Oct 12, 2021Updated 4 years ago
- DataTalks.Club's Data Engineering Zoomcamp Project☆24Jul 14, 2022Updated 3 years ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 3 years ago
- Data Guy Story commandline☆11Dec 2, 2022Updated 3 years ago
- Spark Structured Streaming data pipeline that processes movie ratings data in real-time.☆14Apr 15, 2026Updated 2 months ago
- My Setup Development Environment as Data Engineer☆40Aug 5, 2025Updated 10 months ago
- 🌟 An end-to-end full-stack data science project, including modelling, MLOps, and data storytelling. ✨☆16Aug 30, 2025Updated 9 months ago
- My notes from the @makersacademy course.☆23Apr 10, 2015Updated 11 years ago
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆29Jun 7, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Cool DE Projects☆73Mar 22, 2026Updated 2 months ago
- Fivetran's Jira source dbt package☆14Oct 1, 2025Updated 8 months ago
- An example of a Dagster project with a possible folder structure to organize the assets, jobs, repositories, schedules, and ops. Also has…☆101Nov 3, 2024Updated last year
- StarCraft 2 Data Pipeline with Airflow, DuckDB and Streamlit☆16Mar 14, 2024Updated 2 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 5 years ago
- ELT for AEMET weather data.☆16Mar 23, 2025Updated last year
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- This provider contains operators, decorators and triggers to send a ray job from an airflow task☆25Jun 10, 2026Updated last week
- API/Data Platform for Ingesting, Storing, and Serving Data through Postgres, and Litestar☆11Apr 25, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Performant, highly available distributed storage using SeaweedFS in Docker Swarm☆16Jan 10, 2023Updated 3 years ago
- SQL Server 2017 Integration Services Cookbook, published by Packt☆17Jan 30, 2023Updated 3 years ago
- Docktor is a Web App that deploys an easy-to-use kit of analysis and scanning tools.☆14Nov 1, 2023Updated 2 years ago
- A Python Snowpark CLI for loading the TPC-DI dataset into Snowflake. Additional dbt models for building the data warehouse.☆11Sep 4, 2025Updated 9 months ago
- Source code for 'Pro Power BI Desktop' by Adam Aspin☆22Dec 4, 2017Updated 8 years ago
- Source code for 'Power Query for Power BI and Excel' by Christopher Webb and Crossjoin Consulting Limited☆19Aug 18, 2017Updated 8 years ago
- Pipeline that extracts data from the Spotify API to build a more detailed version of Spotify Wrapped☆49Jun 7, 2026Updated last week
- Repo for learning DBT with Snowflake, featuring projects and models for data transformation and automation☆26Mar 31, 2025Updated last year
- Skooldio: Data Pipelines with Airflow☆23May 24, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Fivetran's social media reporting dbt package. Combine your Facebook Pages, Instagram Business, Twitter Organic, and LinkedIn Pages socia…☆25Jun 11, 2026Updated last week
- ☆11Dec 28, 2020Updated 5 years ago
- ☆15May 1, 2024Updated 2 years ago
- Data Engineering with AWS Cookbook, published by Packt☆26Apr 13, 2026Updated 2 months ago
- A fully serverless, event-driven data pipeline that ingests, enriches, validates, and visualizes real-time news data using AWS services. …☆25Aug 10, 2025Updated 10 months ago
- Analytics engineering with dbt - projects and developer environment☆22Sep 27, 2024Updated last year
- This repository contains a Docker Compose configuration for running ScyllaDB, a highly scalable NoSQL database for learning and testing.☆14Sep 19, 2024Updated last year