Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
☆65Jul 21, 2023Updated 2 years ago
Alternatives and similar repositories for streaming_data_processing
Users that are interested in streaming_data_processing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆146Jul 27, 2023Updated 2 years ago
- Writes the CSV file to Postgres, read table and modify it. Write more tables to Postgres with Airflow.☆37Sep 1, 2023Updated 2 years ago
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Aug 26, 2023Updated 2 years ago
- Docker Apache Airflow☆13Mar 1, 2023Updated 3 years ago
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆43Sep 26, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Repository for Data Engineering Zoomcamp 2024☆14Mar 25, 2024Updated 2 years ago
- Business challenge that requires building a data platform for retailer data analytics.☆18Feb 19, 2023Updated 3 years ago
- End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API…☆21Jul 26, 2024Updated last year
- An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Az…☆31Oct 2, 2023Updated 2 years ago
- Comparison between label, one-hot, target, and cross-fold target encoding☆13Mar 5, 2019Updated 7 years ago
- ☆47Jul 6, 2024Updated last year
- Transparent at-rest AES encryption for Firebase.☆16Apr 6, 2026Updated last month
- Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from…☆37Jan 23, 2025Updated last year
- Fully dockerized Data Warehouse (DWH) using Airflow, dbt, PostgreSQL and dashboard using redash☆26Nov 12, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A simple tutorial script on Streamlit using the Iris Dataset☆13Sep 13, 2023Updated 2 years ago
- Base structure for a ML or DL project☆13Jun 9, 2025Updated 11 months ago
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆37Jun 9, 2023Updated 2 years ago
- Stock Market predictions with Prophet and FastAPI☆17Dec 22, 2021Updated 4 years ago
- End to end data pipeline to extract and analyze submissions from any subreddit using Pushshift, python, dbt and BigQuery.☆12Jul 17, 2023Updated 2 years ago
- Stock Prediction using LSTM, Linear Regression, ARIMA and GARCH models. Hyperparameter Optimization using Optuna framework for LSTM varia…☆20Jan 29, 2026Updated 3 months ago
- A sphinx extension for adding pyscript to a page☆15Updated this week
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- An MCP server that provides seamless access to a user's Last.fm listening data and music information via AI assistants like Claude.☆36May 6, 2026Updated 2 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This repository contains everything you need to become proficient in System Design and Case Studies with Code Implementation☆18Jan 27, 2024Updated 2 years ago
- Google Flan T5☆11Feb 5, 2023Updated 3 years ago
- This extension makes vscode seamlessly work with dbt and bigquery☆15Sep 27, 2022Updated 3 years ago
- ☆16Sep 6, 2023Updated 2 years ago
- ☆18Mar 25, 2026Updated last month
- Data Vault 2.0: Code generation, Vertica, Airflow☆13Nov 20, 2019Updated 6 years ago
- ☆20Mar 9, 2026Updated 2 months ago
- ☆15Feb 15, 2023Updated 3 years ago
- A python library allowing to import and export metabase database configuration from a metabase API☆22Aug 2, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Scripts complement the Optimizing a Data Vault data warehouse on the Snowflake Cloud Data Platform webinar☆16Oct 8, 2020Updated 5 years ago
- ☆12Jan 14, 2023Updated 3 years ago
- Testing Boring SL with DuckDB☆33Aug 18, 2025Updated 9 months ago
- Starter application demonstrating how to connect a NestJS API to a PlanetScale MySQL database☆11May 6, 2026Updated 2 weeks ago
- Project based learning for Data Engineering fundamentals.☆13Jan 15, 2021Updated 5 years ago
- Docker with Airflow and Spark standalone cluster☆264Aug 5, 2023Updated 2 years ago
- Background materials for the article "Productivity Assessment of Neural Code Completion"☆16Jul 11, 2023Updated 2 years ago