Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
☆65Jul 21, 2023Updated 2 years ago
Alternatives and similar repositories for streaming_data_processing
Users that are interested in streaming_data_processing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆146Jul 27, 2023Updated 2 years ago
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Aug 26, 2023Updated 2 years ago
- Create a chatbot that provides responses in Vietnamese, focusing on the products offered by a flower shop☆11Nov 14, 2024Updated last year
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆43Sep 26, 2023Updated 2 years ago
- Repository for Data Engineering Zoomcamp 2024☆14Mar 25, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Creating a Data Pipeline for Stock Data☆14Jan 12, 2024Updated 2 years ago
- Business challenge that requires building a data platform for retailer data analytics.☆18Feb 19, 2023Updated 3 years ago
- End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API…☆21Jul 26, 2024Updated last year
- An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Az…☆30Oct 2, 2023Updated 2 years ago
- Comparison between label, one-hot, target, and cross-fold target encoding☆13Mar 5, 2019Updated 7 years ago
- ☆37Jul 18, 2025Updated 10 months ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆55Sep 30, 2023Updated 2 years ago
- The goal of this project is to classify the actions taken by the user (walking, climbing stairs and descendingn stairs) from the 3D accel…☆11Jan 17, 2020Updated 6 years ago
- ☆46Jul 6, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 커버리스트 - 북 커버 생성 AI 서비스☆13Sep 11, 2022Updated 3 years ago
- a Jax/Flax inference code of StarCoder☆12Jun 12, 2023Updated 3 years ago
- Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from…☆37Jan 23, 2025Updated last year
- Fully dockerized Data Warehouse (DWH) using Airflow, dbt, PostgreSQL and dashboard using redash☆26Nov 12, 2022Updated 3 years ago
- 🏅토스 NEXT ML CHALLENGE : 광고 클릭 예측(CTR) 대회 5등 모델 제출용 레포지토리🏅☆25Feb 2, 2026Updated 4 months ago
- FastAPI CLI is a command-line tool designed to help developers quickly generate a structured project file system for FastAPI applications…☆12Feb 3, 2025Updated last year
- A simple type-safe Scala wrapper for Google App Engine Datastore☆15Jun 15, 2015Updated 10 years ago
- 🥈12th place solution on G2Net Detecting Continuous Gravitational Waves🥈☆14Jan 4, 2023Updated 3 years ago
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆37Jun 9, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Base structure for a ML or DL project☆13Jun 9, 2025Updated last year
- Stock Market predictions with Prophet and FastAPI☆17Dec 22, 2021Updated 4 years ago
- End to end data pipeline to extract and analyze submissions from any subreddit using Pushshift, python, dbt and BigQuery.☆12Jul 17, 2023Updated 2 years ago
- A VS Code Extension to make it easier to manage and develop Spark jobs on EMR☆39Feb 17, 2025Updated last year
- ☆12Jul 17, 2024Updated last year
- Basketball Statistics Demo☆11Oct 18, 2016Updated 9 years ago
- Modeling customer churn with Spark☆12Jan 24, 2019Updated 7 years ago
- 🎖️ 5th place solution in the Google American Sign Language Fingerspelling Recognition Competition🎖️☆16Sep 19, 2023Updated 2 years ago
- Toolset for detecting reflected xss in websites☆16Oct 6, 2018Updated 7 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- (Python, PySpark)☆11Nov 15, 2020Updated 5 years ago
- Code repository for the End-to-End Data Science in SAS book☆16Aug 22, 2020Updated 5 years ago
- Code for 'Contrastive Multi-Document Question Generation'☆11Oct 16, 2022Updated 3 years ago
- This repository contains everything you need to become proficient in System Design and Case Studies with Code Implementation☆18Jan 27, 2024Updated 2 years ago
- ☆20Aug 27, 2024Updated last year
- An MCP server that provides seamless access to a user's Last.fm listening data and music information via AI assistants like Claude.☆40May 6, 2026Updated last month
- asynchronous Ekşi Sözlük scraper☆10Feb 28, 2026Updated 3 months ago