A list of free datasets that provide streaming data
☆441Apr 13, 2026Updated last month
Alternatives and similar repositories for awesome-public-streaming-datasets
Users that are interested in awesome-public-streaming-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A list of publicly available datasets with real-time data maintained by the team at bytewax.io☆2,484Apr 13, 2026Updated last month
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆542Jan 27, 2026Updated 4 months ago
- A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!☆881Apr 16, 2022Updated 4 years ago
- A Kubernetes Operator to orchestrate Benthos pipelines☆43Jul 8, 2024Updated last year
- Simplify Big Data Analytics with Amazon EMR, published by Packt☆13Jan 18, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and more☆434Nov 28, 2023Updated 2 years ago
- A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!☆12Jul 6, 2023Updated 2 years ago
- Trying out the Dataframe Polars library with Delta Lake ... feat Python.☆12Jan 29, 2025Updated last year
- Kubernetes deployment of PrestoDB, Hive Metastore, and Minio S3-standard object store☆17Oct 20, 2022Updated 3 years ago
- A walkthrough of setting up a Kinesis Data Analytics for Java Application which ingest streaming JSON data and leverages the Flink Table …☆16Aug 30, 2023Updated 2 years ago
- Sample code for building a Python application for Apache Flink on Kinesis Data Analytics.☆14Aug 30, 2023Updated 2 years ago
- Este é um projeto de exemplo que demonstra um processo de ETL (Extração, Transformação e Carga) de dados usando Python, Polars e AWS Loca…☆15Sep 25, 2023Updated 2 years ago
- Source code for the post, 'Getting Started with Data Analysis on AWS, using S3, Glue, Amazon Athena, and QuickSight'☆29Dec 22, 2020Updated 5 years ago
- Using Kafka to track cryptocurrency price trends☆68Apr 17, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆10Mar 12, 2021Updated 5 years ago
- CLI tool to manage Kafka connectors☆10Mar 2, 2024Updated 2 years ago
- A python tool scraping Aiven services metadata and building a connected graph☆15Aug 28, 2025Updated 9 months ago
- End-to-End ELT data pipeline with Postgres, Airbyte, dbt, Dagster, Snowflake and Metabase☆11Jul 13, 2023Updated 2 years ago
- A topic-centric list of HQ open datasets.☆75,618May 23, 2026Updated last week
- Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.☆15Sep 3, 2021Updated 4 years ago
- 🐍 Quick reference guide to common patterns & functions in PySpark.☆679Feb 21, 2023Updated 3 years ago
- Criando Lambda Functions para Ingerir Dados de APIs com AWS CDK☆13Dec 1, 2021Updated 4 years ago
- The klient utility provides a cli for basic kafka cluster operations and topic IO☆16Aug 21, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for my "Efficient Data Processing in SQL" book.☆62Aug 6, 2024Updated last year
- An Awesome List of Open-Source Data Engineering Projects☆3,189Oct 4, 2024Updated last year
- A Redis clone written in Go☆43Aug 20, 2016Updated 9 years ago
- A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.☆690Apr 22, 2022Updated 4 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆48Mar 14, 2024Updated 2 years ago
- Data Pipeline from the Global Historical Climatology Network DataSet☆27Dec 20, 2022Updated 3 years ago
- Step by step instructions to create a production-ready data pipeline☆61Dec 23, 2024Updated last year
- Data engineering with dbt, published by Packt☆103Sep 2, 2025Updated 8 months ago
- A list of useful resources to learn Data Engineering from scratch☆3,996Jun 19, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆23Jan 3, 2022Updated 4 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆12Nov 18, 2023Updated 2 years ago
- Example end to end data engineering project.☆1,409Dec 8, 2022Updated 3 years ago
- This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you…☆11Nov 18, 2025Updated 6 months ago
- 🥪💾 A sample of data from the `jaffle-shop-generator` that powers the Jaffle Shop spanning one year.☆16Jan 23, 2025Updated last year
- End-to-end data platform leveraging the Modern data stack☆52Apr 10, 2024Updated 2 years ago
- Flink, Presto, Trino TPC-DS benchmark☆16Feb 20, 2023Updated 3 years ago