A list of free datasets that provide streaming data
☆440Apr 13, 2026Updated 3 weeks ago
Alternatives and similar repositories for awesome-public-streaming-datasets
Users that are interested in awesome-public-streaming-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆96Jan 21, 2024Updated 2 years ago
- ☆14May 5, 2023Updated 3 years ago
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆542Jan 27, 2026Updated 3 months ago
- A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!☆874Apr 16, 2022Updated 4 years ago
- A Kubernetes Operator to orchestrate Benthos pipelines☆43Jul 8, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Simplify Big Data Analytics with Amazon EMR, published by Packt☆13Jan 18, 2023Updated 3 years ago
- Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and more☆433Nov 28, 2023Updated 2 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆47Dec 11, 2023Updated 2 years ago
- A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!☆12Jul 6, 2023Updated 2 years ago
- Trying out the Dataframe Polars library with Delta Lake ... feat Python.☆12Jan 29, 2025Updated last year
- Kubernetes deployment of PrestoDB, Hive Metastore, and Minio S3-standard object store☆17Oct 20, 2022Updated 3 years ago
- Este é um projeto de exemplo que demonstra um processo de ETL (Extração, Transformação e Carga) de dados usando Python, Polars e AWS Loca…☆15Sep 25, 2023Updated 2 years ago
- Source code for the post, 'Getting Started with Data Analysis on AWS, using S3, Glue, Amazon Athena, and QuickSight'☆29Dec 22, 2020Updated 5 years ago
- ☆10Mar 12, 2021Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- CLI tool to manage Kafka connectors☆10Mar 2, 2024Updated 2 years ago
- A tool to generate PySpark schema from JSON.☆29Jan 21, 2024Updated 2 years ago
- Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…☆40,666May 3, 2026Updated last week
- End-to-End ELT data pipeline with Postgres, Airbyte, dbt, Dagster, Snowflake and Metabase☆11Jul 13, 2023Updated 2 years ago
- A topic-centric list of HQ open datasets.☆75,010Apr 28, 2026Updated last week
- A service implementing the Carbon protocol and storing time series data using kairos☆42Mar 11, 2021Updated 5 years ago
- Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.☆15Sep 3, 2021Updated 4 years ago
- 🐍 Quick reference guide to common patterns & functions in PySpark.☆673Feb 21, 2023Updated 3 years ago
- Criando Lambda Functions para Ingerir Dados de APIs com AWS CDK☆13Dec 1, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The klient utility provides a cli for basic kafka cluster operations and topic IO☆16Aug 21, 2025Updated 8 months ago
- A framework for benchmarking embedding models in hybrid search scenarios (BM25 + vector search) using Weaviate.☆40Updated this week
- Tinybird Node.js SDK☆13Dec 26, 2022Updated 3 years ago
- Code for my "Efficient Data Processing in SQL" book.☆62Aug 6, 2024Updated last year
- An Awesome List of Open-Source Data Engineering Projects☆3,173Oct 4, 2024Updated last year
- A Redis clone written in Go☆43Aug 20, 2016Updated 9 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆48Mar 14, 2024Updated 2 years ago
- Data Pipeline from the Global Historical Climatology Network DataSet☆27Dec 20, 2022Updated 3 years ago
- Step by step instructions to create a production-ready data pipeline☆60Dec 23, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An open and introductory book for the Python API of Apache Spark (pyspark) 📚📖☆12Sep 19, 2025Updated 7 months ago
- Data engineering with dbt, published by Packt☆102Sep 2, 2025Updated 8 months ago
- ☆23Jan 3, 2022Updated 4 years ago
- A list of useful resources to learn Data Engineering from scratch☆3,990Jun 19, 2024Updated last year
- Markdown auto-formatting, beautification, and cleanup for Atom☆45Mar 4, 2023Updated 3 years ago
- Example end to end data engineering project.☆1,411Dec 8, 2022Updated 3 years ago
- This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you…☆11Nov 18, 2025Updated 5 months ago