A list of free datasets that provide streaming data
☆439Apr 13, 2026Updated last week
Alternatives and similar repositories for awesome-public-streaming-datasets
Users that are interested in awesome-public-streaming-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆95Jan 21, 2024Updated 2 years ago
- A list of publicly available datasets with real-time data maintained by the team at bytewax.io☆2,406Apr 13, 2026Updated last week
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆538Jan 27, 2026Updated 2 months ago
- A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!☆872Apr 16, 2022Updated 4 years ago
- Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and more☆433Nov 28, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆47Dec 11, 2023Updated 2 years ago
- Hudi Demo Notebook☆11Mar 5, 2024Updated 2 years ago
- Trying out the Dataframe Polars library with Delta Lake ... feat Python.☆12Jan 29, 2025Updated last year
- Kubernetes deployment of PrestoDB, Hive Metastore, and Minio S3-standard object store☆17Oct 20, 2022Updated 3 years ago
- A walkthrough of setting up a Kinesis Data Analytics for Java Application which ingest streaming JSON data and leverages the Flink Table …☆16Aug 30, 2023Updated 2 years ago
- Sample code for building a Python application for Apache Flink on Kinesis Data Analytics.☆14Aug 30, 2023Updated 2 years ago
- Este é um projeto de exemplo que demonstra um processo de ETL (Extração, Transformação e Carga) de dados usando Python, Polars e AWS Loca…☆15Sep 25, 2023Updated 2 years ago
- Mockingbird is a mock streaming data generator☆136Feb 6, 2025Updated last year
- Source code for the post, 'Getting Started with Data Analysis on AWS, using S3, Glue, Amazon Athena, and QuickSight'☆29Dec 22, 2020Updated 5 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆10Mar 12, 2021Updated 5 years ago
- CLI tool to manage Kafka connectors☆10Mar 2, 2024Updated 2 years ago
- A tool to generate PySpark schema from JSON.☆29Jan 21, 2024Updated 2 years ago
- Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…☆40,011Apr 8, 2026Updated last week
- End-to-End ELT data pipeline with Postgres, Airbyte, dbt, Dagster, Snowflake and Metabase☆11Jul 13, 2023Updated 2 years ago
- A service implementing the Carbon protocol and storing time series data using kairos☆42Mar 11, 2021Updated 5 years ago
- Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.☆15Sep 3, 2021Updated 4 years ago
- 🐍 Quick reference guide to common patterns & functions in PySpark.☆667Feb 21, 2023Updated 3 years ago
- The klient utility provides a cli for basic kafka cluster operations and topic IO☆16Aug 21, 2025Updated 7 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Tinybird Node.js SDK☆13Dec 26, 2022Updated 3 years ago
- An Awesome List of Open-Source Data Engineering Projects☆3,143Oct 4, 2024Updated last year
- A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.☆689Apr 22, 2022Updated 3 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆48Mar 14, 2024Updated 2 years ago
- Data Pipeline from the Global Historical Climatology Network DataSet☆27Dec 20, 2022Updated 3 years ago
- Step by step instructions to create a production-ready data pipeline☆59Dec 23, 2024Updated last year
- A list of useful resources to learn Data Engineering from scratch☆3,984Jun 19, 2024Updated last year
- ☆23Jan 3, 2022Updated 4 years ago
- Example end to end data engineering project.☆1,404Dec 8, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you…☆11Nov 18, 2025Updated 5 months ago
- End-to-end data platform leveraging the Modern data stack☆52Apr 10, 2024Updated 2 years ago
- This is a basic Apache Pinot example for ingesting real-time MySQL change logs using Debezium☆28Jan 8, 2021Updated 5 years ago
- Price Crawler - Tracking Price Inflation☆203Jun 23, 2020Updated 5 years ago
- ☆36Jun 3, 2023Updated 2 years ago
- The Data Engineering Cookbook☆15,054Jan 17, 2026Updated 3 months ago
- ☆47Jul 25, 2022Updated 3 years ago