Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
☆65Jul 21, 2023Updated 2 years ago
Alternatives and similar repositories for streaming_data_processing
Users that are interested in streaming_data_processing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆146Jul 27, 2023Updated 2 years ago
- Writes the CSV file to Postgres, read table and modify it. Write more tables to Postgres with Airflow.☆37Sep 1, 2023Updated 2 years ago
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Aug 26, 2023Updated 2 years ago
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆43Sep 26, 2023Updated 2 years ago
- Business challenge that requires building a data platform for retailer data analytics.☆18Feb 19, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Az…☆32Oct 2, 2023Updated 2 years ago
- Comparison between label, one-hot, target, and cross-fold target encoding☆13Mar 5, 2019Updated 7 years ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆56Sep 30, 2023Updated 2 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆50Dec 4, 2023Updated 2 years ago
- Transparent at-rest AES encryption for Firebase.☆16Updated this week
- Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from…☆38Jan 23, 2025Updated last year
- Fully dockerized Data Warehouse (DWH) using Airflow, dbt, PostgreSQL and dashboard using redash☆25Nov 12, 2022Updated 3 years ago
- a PostgreSQL extension that allows you to set quotas on connections (per user, database or IP)☆13Dec 12, 2014Updated 11 years ago
- Textual numeric datatypes for PostgreSQL☆10Oct 8, 2025Updated 6 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A simple tutorial script on Streamlit using the Iris Dataset☆13Sep 13, 2023Updated 2 years ago
- Base structure for a ML or DL project☆13Jun 9, 2025Updated 10 months ago
- Stock Market predictions with Prophet and FastAPI☆17Dec 22, 2021Updated 4 years ago
- End to end data pipeline to extract and analyze submissions from any subreddit using Pushshift, python, dbt and BigQuery.☆12Jul 17, 2023Updated 2 years ago
- Real-time Credit card Fraud detection using Spark Streaming, Spark ML, Spark SQL, Kafka, Cassandra and Airflow☆11Jul 1, 2022Updated 3 years ago
- A VS Code Extension to make it easier to manage and develop Spark jobs on EMR☆39Feb 17, 2025Updated last year
- V-gram indexing for PostgreSQL☆12Jul 30, 2025Updated 8 months ago
- A sphinx extension for adding pyscript to a page☆15Updated this week
- PostgreSQL fulltext search addon☆12Mar 21, 2018Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- This project demonstrates real-time data streaming and processing architecture using Kafka, Spark Streaming, and Debezium for capturing C…☆13Oct 24, 2024Updated last year
- (Python, PySpark)☆11Nov 15, 2020Updated 5 years ago
- ☆20Mar 9, 2026Updated last month
- Basic ChatBot using CTransformers, ChromaDB and Gradio. Configured for CPU.☆12Nov 23, 2023Updated 2 years ago
- This repository contains everything you need to become proficient in System Design and Case Studies with Code Implementation☆18Jan 27, 2024Updated 2 years ago
- ☆20Aug 27, 2024Updated last year
- Examples to showcase the use of Kale in data science pipelines☆27Mar 24, 2023Updated 3 years ago
- ☆10May 30, 2021Updated 4 years ago
- asynchronous Ekşi Sözlük scraper☆10Feb 28, 2026Updated last month
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- lz4 compression for PostgreSQL 12+ (dev)☆22Oct 16, 2021Updated 4 years ago
- Oracle-style global temporary tables for PostgreSQL☆18Jan 15, 2019Updated 7 years ago
- ☆18Mar 25, 2026Updated 2 weeks ago
- This Guidance, with the sample code, can be used to deploy a carbon data lake to the AWS Cloud using an AWS Cloud Development Kit (AWS CD…☆23Jan 8, 2025Updated last year
- restrict some SQL commands on PostgreSQL☆13Mar 22, 2020Updated 6 years ago
- ☆18Mar 9, 2026Updated last month
- hamming distance fn for postgresql☆22Jun 7, 2011Updated 14 years ago