bytewax / awesome-public-real-time-datasetsView external linksLinks
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
☆2,289Dec 21, 2025Updated last month
Alternatives and similar repositories for awesome-public-real-time-datasets
Users that are interested in awesome-public-real-time-datasets are comparing it to the libraries listed below
Sorting:
- A list of free datasets that provide streaming data☆433May 16, 2024Updated last year
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆94Jan 21, 2024Updated 2 years ago
- Mockingbird is a mock streaming data generator☆134Feb 6, 2025Updated last year
- Python Stream Processing☆1,958Mar 27, 2025Updated 10 months ago
- An Awesome List of Open-Source Data Engineering Projects☆3,016Oct 4, 2024Updated last year
- This is a repo with links to everything you'd ever want to learn about data engineering☆40,137Dec 15, 2025Updated 2 months ago
- Analyzing hacker news in real-time with Bytewax and Proton☆45Jan 25, 2024Updated 2 years ago
- Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…☆38,379Updated this week
- rust-for-data☆50Jul 12, 2023Updated 2 years ago
- A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.☆251Dec 19, 2025Updated last month
- dbt docs but windows 95☆16Jun 7, 2022Updated 3 years ago
- A topic-centric list of HQ open datasets.☆72,724Jan 30, 2026Updated 2 weeks ago
- Data Engineering Practice Problems☆2,533Jan 8, 2025Updated last year
- Mock streaming data generator☆17May 31, 2024Updated last year
- Sample project to demonstrate data engineering best practices☆203Feb 24, 2024Updated last year
- Practical Data Engineering: A Hands-On Real-Estate Project Guide☆769Sep 3, 2024Updated last year
- ☆20Jun 28, 2023Updated 2 years ago
- Demo Project for Open Source MDS☆170Aug 27, 2025Updated 5 months ago
- Transaction processing & vis pipeline using PySpark Streaming☆30Jul 18, 2024Updated last year
- Analyze coinbase orderbook in real-time in Python with Bytewax☆11Apr 23, 2024Updated last year
- End-to-End ELT data pipeline with Postgres, Airbyte, dbt, Dagster, Snowflake and Metabase☆11Jul 13, 2023Updated 2 years ago
- Repository for Data Engineering Interview Series☆36Oct 17, 2024Updated last year
- ☆111Jan 15, 2025Updated last year
- Python Streaming DataFrames for Kafka☆1,519Updated this week
- The best place to learn data engineering. Built and maintained by the data engineering community.☆1,885Jan 31, 2026Updated 2 weeks ago
- dbt starter code for enterprise Snowflake usage data artifacts☆21Sep 7, 2022Updated 3 years ago
- ☆384Jan 26, 2025Updated last year
- Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and more☆391Nov 28, 2023Updated 2 years ago
- A simple playground for dbt with the sqlite connector☆12May 22, 2022Updated 3 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆11Nov 18, 2023Updated 2 years ago
- A list of all awesome open-source contributions for the Apache Kafka project☆108Jul 10, 2023Updated 2 years ago
- This is a template you can use for your next data engineering portfolio project.☆187Sep 10, 2021Updated 4 years ago
- A curated list of data engineering tools for software developers☆8,281Feb 10, 2026Updated last week
- Searchbar widget for your R Shiny application☆15Jun 8, 2020Updated 5 years ago
- 📦 Serverless and local-first Open Data Platform☆306Jan 22, 2026Updated 3 weeks ago
- A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset☆260Dec 13, 2025Updated 2 months ago
- Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plu…☆59Mar 29, 2023Updated 2 years ago
- An end-to-end batch scoring machine learning system that produces hourly predictions of the number of arrivals and departures that will t…☆26Oct 26, 2025Updated 3 months ago
- data load tool (dlt) is an open source Python library that makes data loading easy 🛠️☆4,903Feb 11, 2026Updated last week