A list of free datasets that provide streaming data
☆437May 16, 2024Updated last year
Alternatives and similar repositories for awesome-public-streaming-datasets
Users that are interested in awesome-public-streaming-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆94Jan 21, 2024Updated 2 years ago
- A list of publicly available datasets with real-time data maintained by the team at bytewax.io☆2,383Updated this week
- ☆14May 5, 2023Updated 2 years ago
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆537Jan 27, 2026Updated 2 months ago
- A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!☆865Apr 16, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Source code for the gx cloud agent☆10Mar 23, 2026Updated last week
- Simplify Big Data Analytics with Amazon EMR, published by Packt☆13Jan 18, 2023Updated 3 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆46Dec 11, 2023Updated 2 years ago
- A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!☆12Jul 6, 2023Updated 2 years ago
- Hudi Demo Notebook☆11Mar 5, 2024Updated 2 years ago
- Trying out the Dataframe Polars library with Delta Lake ... feat Python.☆12Jan 29, 2025Updated last year
- A walkthrough of setting up a Kinesis Data Analytics for Java Application which ingest streaming JSON data and leverages the Flink Table …☆16Aug 30, 2023Updated 2 years ago
- Kubernetes deployment of PrestoDB, Hive Metastore, and Minio S3-standard object store☆17Oct 20, 2022Updated 3 years ago
- Sample code for building a Python application for Apache Flink on Kinesis Data Analytics.☆14Aug 30, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Este é um projeto de exemplo que demonstra um processo de ETL (Extração, Transformação e Carga) de dados usando Python, Polars e AWS Loca…☆15Sep 25, 2023Updated 2 years ago
- Mockingbird is a mock streaming data generator☆135Feb 6, 2025Updated last year
- ☆10Mar 12, 2021Updated 5 years ago
- CLI tool to manage Kafka connectors☆10Mar 2, 2024Updated 2 years ago
- A tool to generate PySpark schema from JSON.☆28Jan 21, 2024Updated 2 years ago
- A python tool scraping Aiven services metadata and building a connected graph☆15Aug 28, 2025Updated 7 months ago
- A topic-centric list of HQ open datasets.☆73,729Mar 17, 2026Updated last week
- Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…☆39,324Mar 19, 2026Updated last week
- End-to-End ELT data pipeline with Postgres, Airbyte, dbt, Dagster, Snowflake and Metabase☆11Jul 13, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A service implementing the Carbon protocol and storing time series data using kairos☆42Mar 11, 2021Updated 5 years ago
- A kafka streams client library built on confluent-kafka-python☆66Sep 28, 2023Updated 2 years ago
- Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.☆15Sep 3, 2021Updated 4 years ago
- 🐍 Quick reference guide to common patterns & functions in PySpark.☆666Feb 21, 2023Updated 3 years ago
- How to use Presto (with Hive metastore) and MinIO?☆27Mar 8, 2023Updated 3 years ago
- The klient utility provides a cli for basic kafka cluster operations and topic IO☆16Aug 21, 2025Updated 7 months ago
- Stackable Operator for Apache Airflow☆32Mar 23, 2026Updated last week
- Code for my "Efficient Data Processing in SQL" book.☆61Aug 6, 2024Updated last year
- An Awesome List of Open-Source Data Engineering Projects☆3,100Oct 4, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.☆688Apr 22, 2022Updated 3 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆48Mar 14, 2024Updated 2 years ago
- Data Pipeline from the Global Historical Climatology Network DataSet☆27Dec 20, 2022Updated 3 years ago
- Step by step instructions to create a production-ready data pipeline☆58Dec 23, 2024Updated last year
- Data engineering with dbt, published by Packt☆96Sep 2, 2025Updated 6 months ago
- A list of useful resources to learn Data Engineering from scratch☆3,965Jun 19, 2024Updated last year
- ☆23Jan 3, 2022Updated 4 years ago