xavient/Data-Ingestion-Platform

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xavient/Data-Ingestion-Platform)

xavient / Data-Ingestion-Platform

☆51

Alternatives and similar repositories for Data-Ingestion-Platform

Users that are interested in Data-Ingestion-Platform are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dk-stationery / stationery-ink
View on GitHub
Distributed SQL base Realtime Streaming Computation Framework On Apache Storm, Spark
☆12Mar 14, 2016Updated 10 years ago
yamrcraft / etl-light
View on GitHub
A light Kafka to HDFS/S3 ETL library based on Apache Spark
☆40Jun 29, 2017Updated 9 years ago
hougs / fantasy-football
View on GitHub
Choosing a fantasy football team using spark, hive, python, and really just about anything.
☆20Feb 13, 2015Updated 11 years ago
datafibers-community / df_data_service
View on GitHub
DataFibers Data Service
☆31Feb 11, 2022Updated 4 years ago
sudar / learn-python-hard-way-exercises
View on GitHub
This repo contains the exercises, I did while reading through the Learn Python the hard way book by Zed A. Shaw
☆16Sep 23, 2017Updated 8 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
amient / affinity
View on GitHub
Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka
☆25Oct 16, 2020Updated 5 years ago
wuworker / netty-proxy
View on GitHub
基于netty实现代理服务器
☆12Jul 4, 2026Updated 2 weeks ago
fluency03 / blockchain-in-scala
View on GitHub
💸💸💸 A simplified Blockchain implementation in Scala based on the specifications of Bitcoin.
☆14May 13, 2018Updated 8 years ago
jeoffreylim / maelstrom
View on GitHub
Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream …
☆21Feb 6, 2017Updated 9 years ago
ingesttips / examples
View on GitHub
Data ingestion examples
☆11Feb 12, 2015Updated 11 years ago
coreyauger / typebus
View on GitHub
Framework for building distributed microserviceies in scala with akka-streams and kafka
☆15Nov 10, 2019Updated 6 years ago
RedisLabs / ReSearch
View on GitHub
Redis search and indexing in Java
☆16Sep 26, 2016Updated 9 years ago
bluecolor / octopus
View on GitHub
Open source task scheduler with dependency management
☆15Jul 1, 2018Updated 8 years ago
randerzander / HiveToPhoenix
View on GitHub
An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase
☆14Mar 23, 2016Updated 10 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ZubairNabi / prosparkstreaming
View on GitHub
Code used in "Pro Spark Streaming: The Zen of Real-time Analytics using Apache Spark" published by Apress Publishing.
☆48Mar 27, 2016Updated 10 years ago
indix / sparkplug
View on GitHub
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
☆28May 15, 2020Updated 6 years ago
milinda / samza-sql
View on GitHub
SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka
☆30Jun 8, 2016Updated 10 years ago
jerzygangi / forklift
View on GitHub
🚚 ETL for Spark and Airflow
☆25Mar 19, 2018Updated 8 years ago
fraibacas / lakehouse-poc
View on GitHub
Run an open-source data LakeHouse locally using Docker Compose
☆12May 31, 2024Updated 2 years ago
rtahboub / spark-sql-customized-parser
View on GitHub
An experiment to inject a customized parser using SparkSessionExtension
☆16Jan 1, 2018Updated 8 years ago
chenzhenyang / aquila
View on GitHub
迁移工具，目标是Oracle，MySQL，SqlServer到PostgreSQL的单项迁移，PostgreSQL和大数据平台Hive，Hbase，Impala等的双向迁移。
☆10Dec 3, 2014Updated 11 years ago
vvaks0 / DeviceManagerDemo
View on GitHub
The Device Manager Demo is designed to demonstrate a fully functioning modern Data/IoT application. It is a Lambda architecture built usi…
☆13Aug 31, 2017Updated 8 years ago
vngrs / spark-etl
View on GitHub
Apache Spark based ETL Engine
☆71Oct 18, 2016Updated 9 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AbsaOSS / spark-hofs
View on GitHub
Scala API for Apache Spark SQL high-order functions
☆15Aug 4, 2023Updated 2 years ago
bahaaldine / scalable-big-data-architecture
View on GitHub
Assets used in Apress -- Scalable Big Data Architecture -- book
☆19Dec 11, 2015Updated 10 years ago
homeaway / datapull
View on GitHub
Cloud based Data Platform based on Apache Spark
☆28Jun 30, 2026Updated 3 weeks ago
realxujiang / storm-kafka-examples
View on GitHub
storm kafka hdfs examples
☆21Nov 28, 2016Updated 9 years ago
japerry911 / crypto-data-pipeline
View on GitHub
Data Pipeline that utilizes GCP, Python 3.10, Prefect, and more.
☆10Jan 23, 2023Updated 3 years ago
zengxiaosen / flinkMultiStreamOptimization
View on GitHub
优化flink的多流操作（例如join），优化点不限于数据丢失问题，以及性能问题
☆11Apr 8, 2019Updated 7 years ago
tresata / spark-kafka
View on GitHub
Low level integration of Spark and Kafka
☆129Mar 15, 2018Updated 8 years ago
faizeraza / dataengineering-github-data-pipelineline
View on GitHub
In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…
☆12Sep 9, 2023Updated 2 years ago
NextMark / datashops
View on GitHub
A distributed data factory, providing data access, etl, scheduling. Easily manage tasks such as hive, spark, clickhouse, flink, shell, py…
☆34May 21, 2022Updated 4 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
mispecto / realtime-dashboard-example
View on GitHub
This is a real-time dashboard example using Spark Streaming and Node.js
☆25Dec 17, 2025Updated 7 months ago
sergio11 / document_search_engine_architecture
View on GitHub
📄🚀 Unleash a powerful Document Search Engine with Apache NiFi for lightning-fast, comprehensive text indexing and search.
☆30Nov 26, 2025Updated 7 months ago
cpbaranwal / Avro-SparkStreaming-Kafka
View on GitHub
Code for processing AVRO data in Spark Streaming + Kafka (DirectKafka approach with custom offset management in zookeeper)
☆29Sep 9, 2016Updated 9 years ago
Kent7306 / akkaflow
View on GitHub
akkaflow是一个基于akka架构上构建的分布式高可用DAG工作流调度工具，可以把子节点分配在集群机器上并行执行，高效利用集群资源。
☆106Sep 14, 2019Updated 6 years ago
JerryLead / SparkProfiler
View on GitHub
Profiling Spark Applications for Performance Comparison and Diagnosis
☆16Nov 11, 2018Updated 7 years ago
xmlking / cdc-kafka-hadoop
View on GitHub
MySQL to NoSQL real time dataflow
☆19Oct 14, 2017Updated 8 years ago
rmetzger / flink-streaming-etl
View on GitHub
A demo repository for "streaming etl" with Apache Flink
☆44Jun 8, 2016Updated 10 years ago