Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consumes and processes Kafka data, saving it to the Datalake. Airflow orchestrates the pipeline. dbt moves data to Snowflake, transforms it, and creates dashboards.
☆71Dec 17, 2023Updated 2 years ago
Alternatives and similar repositories for spotify-stream-analytics
Users that are interested in spotify-stream-analytics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12May 27, 2024Updated last year
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Aug 12, 2025Updated 8 months ago
- Simple project using pyflink, kafka and postgre containerized using Docker☆11Aug 26, 2024Updated last year
- This Power BI project provides insights into customer orders and product tracking using interactive dashboards. It visualizes order statu…☆10Aug 15, 2025Updated 8 months ago
- This project is for demonstrating knowledge of Data Engineering tools and concepts and also learning in the process☆44Dec 1, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Local development environment for python data projects, with Docker☆23Dec 14, 2022Updated 3 years ago
- ☆11Aug 10, 2023Updated 2 years ago
- This is a capstone project associated with MLOps Zoomcamp. The end goal of the project is to build an end-to-end machine learning projec…☆13Sep 8, 2022Updated 3 years ago
- The goal of this project is to build an ETL pipeline. The data would be processed as a batch (monthly) between 2018-01 and 2021-02.☆14Mar 26, 2022Updated 4 years ago
- End-to-end data platform leveraging the Modern data stack☆52Apr 10, 2024Updated 2 years ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆55Sep 30, 2023Updated 2 years ago
- Here I will be exploring various tools and methods that are used in data engineering process with Python.☆21Jan 4, 2021Updated 5 years ago
- 🤖 An autonomous AI agent system that collaboratively designs, implements, and manages Apache Airflow DAGs through natural language inter…☆28Aug 6, 2025Updated 8 months ago
- Candace's Data Engineering Zoomcamp files and notes☆18Jul 4, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This repo consists of all important concepts for data engineers.☆11Dec 24, 2024Updated last year
- A command line client builder that follows the Canonical's Guidelines for a Command Line Interface.☆15Updated this week
- A project management CLI written in purely SQL.☆17Dec 31, 2024Updated last year
- An open and introductory book for the Python API of Apache Spark (pyspark) 📚📖☆12Sep 19, 2025Updated 7 months ago
- This project demonstrates how to build and automate an ETL pipeline written in Python and schedule it using open source Apache Airflow or…☆20Aug 21, 2025Updated 8 months ago
- Code for my "Efficient Data Processing in SQL" book.☆62Aug 6, 2024Updated last year
- Para entender e aprender um pouco sobre o Apache Kafka.https://www.youtube.com/channel/UC3pevgVzUWKo5CoWdhDsoHw☆13Mar 10, 2026Updated last month
- 📦 Starting box for Vagrant. Inside box Ubuntu 20.04 LTS with Git, Docker and Docker compose.☆19May 5, 2022Updated 3 years ago
- A sphinx extension for adding pyscript to a page☆15Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- AWS LocalStack + Spark Cluster + Zeppelin [Docker]☆10Jul 6, 2022Updated 3 years ago
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- Data Engineer Project: An end-to-end Airflow data pipeline with BigQuery, dbt Soda, and more!☆12Dec 14, 2023Updated 2 years ago
- GitHub Actions to Validate DAGs, Variables and Dependencies upon Pull Request☆23Mar 5, 2026Updated last month
- ☆15Oct 19, 2023Updated 2 years ago
- This repo via a real world use case, shows how to launch dbt models from a DAG in Apache Airflow.☆14Apr 22, 2026Updated last week
- Set of Jupyter notebooks demonstrating Learning to Rank integrated with Solr and Elasticsearch☆17Jun 19, 2022Updated 3 years ago
- ☆22Feb 5, 2024Updated 2 years ago
- Tutorial for running Django on Azure☆16Feb 7, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆21Oct 21, 2024Updated last year
- ☆16May 29, 2023Updated 2 years ago
- A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!☆12Jul 6, 2023Updated 2 years ago
- ☆16Feb 17, 2020Updated 6 years ago
- This project provides valuable customer sentiment insights for Zomato by tracking and analyzing tweets related to their brand and service…☆14Aug 27, 2023Updated 2 years ago
- grafana dashboards☆26Apr 8, 2021Updated 5 years ago
- ChartMogul API Python Client☆17Apr 17, 2026Updated last week