A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
☆23Nov 19, 2024Updated last year
Alternatives and similar repositories for Youtube-Recommend-Master-ETL-Pipeline
Users that are interested in Youtube-Recommend-Master-ETL-Pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆43Apr 22, 2023Updated 2 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Aug 12, 2025Updated 7 months ago
- End-to-end ELT data engineering project☆22Dec 24, 2022Updated 3 years ago
- A quickstart tool for creating a FastAPI project with Jinja2, TailwindCSS, Flowbite, HTMX, and AlpineJS.☆13Jun 23, 2025Updated 9 months ago
- Source Code for 'Beginning Blockchain' by Bikramaditya Singhal, Gautam Dhameja, and Priyansu Sekhar Panda☆10May 17, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆30Feb 11, 2024Updated 2 years ago
- Scan and monitor your network effortlessly! Nmap Prometheus Exporter provides insights into network health and security with Prometheus-c…☆15Oct 2, 2023Updated 2 years ago
- A simple demo showing how to use Ably and fastAPI to route messages into Kafka for stream processing☆16Oct 12, 2021Updated 4 years ago
- DataTalks.Club's Data Engineering Zoomcamp Project☆24Jul 14, 2022Updated 3 years ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 2 years ago
- Data Guy Story commandline☆11Dec 2, 2022Updated 3 years ago
- an end-to-end data pipeline extracting music listening habits and producing an insightful dashboard☆17Mar 31, 2024Updated last year
- Spark Structured Streaming data pipeline that processes movie ratings data in real-time.☆13Mar 1, 2026Updated 3 weeks ago
- My Setup Development Environment as Data Engineer☆36Aug 5, 2025Updated 7 months ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- 🌟 An end-to-end full-stack data science project, including modelling, MLOps, and data storytelling. ✨☆16Aug 30, 2025Updated 7 months ago
- Create agents in PHP that monitor and act on your behalf. A Laravel based Huginn port.☆13Jan 4, 2023Updated 3 years ago
- My notes from the @makersacademy course.☆23Apr 10, 2015Updated 10 years ago
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆29Jun 7, 2023Updated 2 years ago
- Source code for 'Pro Power BI Desktop' by Adam Aspin☆13Mar 28, 2017Updated 9 years ago
- Fivetran's Jira source dbt package☆14Oct 1, 2025Updated 5 months ago
- An example of a Dagster project with a possible folder structure to organize the assets, jobs, repositories, schedules, and ops. Also has…☆102Nov 3, 2024Updated last year
- ELT for AEMET weather data.☆16Mar 23, 2025Updated last year
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- This extension makes vscode seamlessly work with dbt and bigquery☆15Sep 27, 2022Updated 3 years ago
- API/Data Platform for Ingesting, Storing, and Serving Data through Postgres, and Litestar☆11Jan 18, 2026Updated 2 months ago
- Performant, highly available distributed storage using SeaweedFS in Docker Swarm☆15Jan 10, 2023Updated 3 years ago
- SQL Server 2017 Integration Services Cookbook, published by Packt☆17Jan 30, 2023Updated 3 years ago
- Docktor is a Web App that deploys an easy-to-use kit of analysis and scanning tools.☆13Nov 1, 2023Updated 2 years ago
- ☆15Mar 15, 2024Updated 2 years ago
- End to end data pipeline to extract and analyze submissions from any subreddit using Pushshift, python, dbt and BigQuery.☆12Jul 17, 2023Updated 2 years ago
- The AI models used for my personal purposes and their usage (Gemini, Copilot, Dialogflow,...)☆19Apr 5, 2024Updated last year
- A dataset for Movie Recommendation with NebulaGraph, ETL to merge two dataset: OMDB & Movielens with dbt, postgres and Nebula-Importer☆17Sep 7, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Pipeline that extracts data from the Spotify API to build a more detailed version of Spotify Wrapped☆49Mar 13, 2026Updated 2 weeks ago
- ☆11Dec 28, 2020Updated 5 years ago
- ☆13May 1, 2024Updated last year
- Data Engineering with AWS Cookbook, published by Packt☆24Dec 1, 2024Updated last year
- This repository contains a Docker Compose configuration for running ScyllaDB, a highly scalable NoSQL database for learning and testing.☆13Sep 19, 2024Updated last year
- Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"☆12May 9, 2023Updated 2 years ago
- Code to demonstrate data engineering metadata & logging best practices☆21Mar 12, 2024Updated 2 years ago