A list of free datasets that provide streaming data
☆444May 29, 2026Updated 3 weeks ago
Alternatives and similar repositories for awesome-public-streaming-datasets
Users that are interested in awesome-public-streaming-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆97Jan 21, 2024Updated 2 years ago
- A list of publicly available datasets with real-time data maintained by the team at bytewax.io☆2,507Apr 13, 2026Updated 2 months ago
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆544Jan 27, 2026Updated 4 months ago
- A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!☆882Apr 16, 2022Updated 4 years ago
- A Kubernetes Operator to orchestrate Benthos pipelines☆43Jul 8, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Simplify Big Data Analytics with Amazon EMR, published by Packt☆13Jan 18, 2023Updated 3 years ago
- Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and more☆434Nov 28, 2023Updated 2 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆48Dec 11, 2023Updated 2 years ago
- A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!☆12Jul 6, 2023Updated 2 years ago
- Hudi Demo Notebook☆11Mar 5, 2024Updated 2 years ago
- Trying out the Dataframe Polars library with Delta Lake ... feat Python.☆12Jan 29, 2025Updated last year
- A walkthrough of setting up a Kinesis Data Analytics for Java Application which ingest streaming JSON data and leverages the Flink Table …☆16Aug 30, 2023Updated 2 years ago
- Kubernetes deployment of PrestoDB, Hive Metastore, and Minio S3-standard object store☆17Oct 20, 2022Updated 3 years ago
- Sample code for building a Python application for Apache Flink on Kinesis Data Analytics.☆14Aug 30, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Este é um projeto de exemplo que demonstra um processo de ETL (Extração, Transformação e Carga) de dados usando Python, Polars e AWS Loca…☆15Sep 25, 2023Updated 2 years ago
- ☆10Mar 12, 2021Updated 5 years ago
- A topic-centric list of HQ open datasets.☆75,979Updated this week
- A service implementing the Carbon protocol and storing time series data using kairos☆42Mar 11, 2021Updated 5 years ago
- 🐍 Quick reference guide to common patterns & functions in PySpark.☆689Feb 21, 2023Updated 3 years ago
- Criando Lambda Functions para Ingerir Dados de APIs com AWS CDK☆13Dec 1, 2021Updated 4 years ago
- The klient utility provides a cli for basic kafka cluster operations and topic IO☆16Aug 21, 2025Updated 9 months ago
- A framework for benchmarking embedding models in hybrid search scenarios (BM25 + vector search) using Weaviate.☆40May 20, 2026Updated last month
- Code for my "Efficient Data Processing in SQL" book.☆63Aug 6, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- An Awesome List of Open-Source Data Engineering Projects☆3,211Oct 4, 2024Updated last year
- A Redis clone written in Go☆43Aug 20, 2016Updated 9 years ago
- A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.☆690Apr 22, 2022Updated 4 years ago
- Data Pipeline from the Global Historical Climatology Network DataSet☆27Dec 20, 2022Updated 3 years ago
- Step by step instructions to create a production-ready data pipeline☆62Dec 23, 2024Updated last year
- Data engineering with dbt, published by Packt☆103Sep 2, 2025Updated 9 months ago
- ☆23Jan 3, 2022Updated 4 years ago
- A list of useful resources to learn Data Engineering from scratch☆3,994Jun 19, 2024Updated 2 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆12Nov 18, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Markdown auto-formatting, beautification, and cleanup for Atom☆45Mar 4, 2023Updated 3 years ago
- Example end to end data engineering project.☆1,411Dec 8, 2022Updated 3 years ago
- This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you…☆11Nov 18, 2025Updated 7 months ago
- Portfolio Site☆19Dec 28, 2025Updated 5 months ago
- Price Crawler - Tracking Price Inflation☆205Jun 23, 2020Updated 5 years ago
- ☆37Jun 3, 2023Updated 3 years ago
- The Data Engineering Cookbook☆15,142Jun 12, 2026Updated last week