Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consumes and processes Kafka data, saving it to the Datalake. Airflow orchestrates the pipeline. dbt moves data to Snowflake, transforms it, and creates dashboards.
☆72Dec 17, 2023Updated 2 years ago
Alternatives and similar repositories for spotify-stream-analytics
Users that are interested in spotify-stream-analytics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12May 27, 2024Updated 2 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆21Aug 12, 2025Updated 9 months ago
- Simple project using pyflink, kafka and postgre containerized using Docker☆11Aug 26, 2024Updated last year
- ☆12Oct 10, 2023Updated 2 years ago
- This repository contains the capstone project carried out as part of Machine Learning Zoomcamp course☆10Dec 26, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This project is for demonstrating knowledge of Data Engineering tools and concepts and also learning in the process☆44Dec 1, 2022Updated 3 years ago
- Local development environment for python data projects, with Docker☆23Dec 14, 2022Updated 3 years ago
- ☆11Aug 10, 2023Updated 2 years ago
- ☆17Apr 19, 2024Updated 2 years ago
- This is a capstone project associated with MLOps Zoomcamp. The end goal of the project is to build an end-to-end machine learning projec…☆13Sep 8, 2022Updated 3 years ago
- The goal of this project is to build an ETL pipeline. The data would be processed as a batch (monthly) between 2018-01 and 2021-02.☆14Mar 26, 2022Updated 4 years ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆55Sep 30, 2023Updated 2 years ago
- Here I will be exploring various tools and methods that are used in data engineering process with Python.☆21Jan 4, 2021Updated 5 years ago
- Code test for data engineering candidates☆47Mar 27, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Data Science Intern at Data Glacier☆12Jun 30, 2022Updated 3 years ago
- 🤖 An autonomous AI agent system that collaboratively designs, implements, and manages Apache Airflow DAGs through natural language inter…☆28Aug 6, 2025Updated 10 months ago
- Candace's Data Engineering Zoomcamp files and notes☆18Jul 4, 2023Updated 2 years ago
- This repo consists of all important concepts for data engineers.☆11Jun 2, 2026Updated last week
- ☆14Aug 28, 2024Updated last year
- An AWS Data Engineering End-to-End Project (Glue, Lambda, Kinesis, Redshift, QuickSight, Athena, EC2, S3)☆16Sep 20, 2023Updated 2 years ago
- This project demonstrates how to build and automate an ETL pipeline written in Python and schedule it using open source Apache Airflow or…☆24Aug 21, 2025Updated 9 months ago
- Data Augmentation with Python, published by Packt☆37Oct 28, 2024Updated last year
- Code for my "Efficient Data Processing in SQL" book.☆62Aug 6, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.☆21Jun 12, 2024Updated last year
- A sphinx extension for adding pyscript to a page☆15Jun 1, 2026Updated last week
- 📦 Starting box for Vagrant. Inside box Ubuntu 20.04 LTS with Git, Docker and Docker compose.☆19May 5, 2022Updated 4 years ago
- AWS LocalStack + Spark Cluster + Zeppelin [Docker]☆10Jul 6, 2022Updated 3 years ago
- (Python, PySpark)☆11Nov 15, 2020Updated 5 years ago
- Analyzing the most strategic words to guess on Wordle, based on letter frequency distributions☆11Feb 20, 2022Updated 4 years ago
- Scripts used to setup a Spark cluster on EC2☆21Mar 24, 2016Updated 10 years ago
- This repo via a real world use case, shows how to launch dbt models from a DAG in Apache Airflow.☆14Apr 22, 2026Updated last month
- Set of Jupyter notebooks demonstrating Learning to Rank integrated with Solr and Elasticsearch☆17Jun 19, 2022Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Apache Hadoop - Docker distribution based on CentOS 7 and Oracle Java 8☆12Feb 20, 2018Updated 8 years ago
- ☆21Nov 4, 2023Updated 2 years ago
- Tutorial for running Django on Azure☆16Feb 7, 2026Updated 4 months ago
- Building Recommender System with the Two-Tower Architecture☆18Aug 10, 2021Updated 4 years ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆51Sep 7, 2023Updated 2 years ago
- ☆12Jul 8, 2024Updated last year
- ☆16May 29, 2023Updated 3 years ago