XavientInformationSystems/Data-Ingestion-Platform

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/XavientInformationSystems/Data-Ingestion-Platform)

XavientInformationSystems / Data-Ingestion-Platform

☆50

Alternatives and similar repositories for Data-Ingestion-Platform

Users that are interested in Data-Ingestion-Platform are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dk-stationery / stationery-ink
View on GitHub
Distributed SQL base Realtime Streaming Computation Framework On Apache Storm, Spark
☆12Mar 14, 2016Updated 10 years ago
yamrcraft / etl-light
View on GitHub
A light Kafka to HDFS/S3 ETL library based on Apache Spark
☆40Jun 29, 2017Updated 8 years ago
hougs / fantasy-football
View on GitHub
Choosing a fantasy football team using spark, hive, python, and really just about anything.
☆20Feb 13, 2015Updated 11 years ago
sudar / learn-python-hard-way-exercises
View on GitHub
This repo contains the exercises, I did while reading through the Learn Python the hard way book by Zed A. Shaw
☆16Sep 23, 2017Updated 8 years ago
datafibers-community / df_data_service
View on GitHub
DataFibers Data Service
☆31Feb 11, 2022Updated 4 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
amient / affinity
View on GitHub
Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka
☆25Oct 16, 2020Updated 5 years ago
shengjk / flinksql-platform
View on GitHub
flinksql-platform
☆19Mar 22, 2021Updated 5 years ago
wuworker / netty-proxy
View on GitHub
基于netty实现代理服务器
☆11Nov 17, 2019Updated 6 years ago
fluency03 / blockchain-in-scala
View on GitHub
💸💸💸 A simplified Blockchain implementation in Scala based on the specifications of Bitcoin.
☆13May 13, 2018Updated 7 years ago
jeoffreylim / maelstrom
View on GitHub
Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream …
☆22Feb 6, 2017Updated 9 years ago
ingesttips / examples
View on GitHub
Data ingestion examples
☆11Feb 12, 2015Updated 11 years ago
bluecolor / octopus
View on GitHub
Open source task scheduler with dependency management
☆15Jul 1, 2018Updated 7 years ago
RedisLabs / ReSearch
View on GitHub
Redis search and indexing in Java
☆16Sep 26, 2016Updated 9 years ago
ZubairNabi / prosparkstreaming
View on GitHub
Code used in "Pro Spark Streaming: The Zen of Real-time Analytics using Apache Spark" published by Apress Publishing.
☆48Mar 27, 2016Updated 10 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
randerzander / HiveToPhoenix
View on GitHub
An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase
☆14Mar 23, 2016Updated 10 years ago
milinda / samza-sql
View on GitHub
SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka
☆29Jun 8, 2016Updated 9 years ago
JerryLead / SparkFaultBench
View on GitHub
A Spark Reliability Testing Suite
☆13Jan 10, 2017Updated 9 years ago
ycloudnet / ya100
View on GitHub
一个比Spark-Parquet还快5~100倍的存储格式
☆12Feb 22, 2016Updated 10 years ago
indix / sparkplug
View on GitHub
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
☆29May 15, 2020Updated 5 years ago
RedBeard0531 / mongo-oplog-watcher
View on GitHub
A python class to make it easier to write triggers for mongodb
☆22May 8, 2017Updated 8 years ago
jerzygangi / forklift
View on GitHub
🚚 ETL for Spark and Airflow
☆25Mar 19, 2018Updated 8 years ago
rtahboub / spark-sql-customized-parser
View on GitHub
An experiment to inject a customized parser using SparkSessionExtension
☆16Jan 1, 2018Updated 8 years ago
vvaks0 / DeviceManagerDemo
View on GitHub
The Device Manager Demo is designed to demonstrate a fully functioning modern Data/IoT application. It is a Lambda architecture built usi…
☆13Aug 31, 2017Updated 8 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
vngrs / spark-etl
View on GitHub
Apache Spark based ETL Engine
☆71Oct 18, 2016Updated 9 years ago
bahaaldine / scalable-big-data-architecture
View on GitHub
Assets used in Apress -- Scalable Big Data Architecture -- book
☆20Dec 11, 2015Updated 10 years ago
homeaway / datapull
View on GitHub
Cloud based Data Platform based on Apache Spark
☆27Feb 17, 2026Updated last month
realxujiang / storm-kafka-examples
View on GitHub
storm kafka hdfs examples
☆21Nov 28, 2016Updated 9 years ago
andypetrella / spark-bd
View on GitHub
Exploration of spark streaming based on the BigData.be project 2
☆15Sep 2, 2013Updated 12 years ago
zengxiaosen / flinkMultiStreamOptimization
View on GitHub
优化flink的多流操作（例如join），优化点不限于数据丢失问题，以及性能问题
☆11Apr 8, 2019Updated 7 years ago
tresata / spark-kafka
View on GitHub
Low level integration of Spark and Kafka
☆131Mar 15, 2018Updated 8 years ago
saurzcode / twitter-stream
View on GitHub
Twitter-Kafka Data Pipeline
☆16Nov 19, 2024Updated last year
mispecto / realtime-dashboard-example
View on GitHub
This is a real-time dashboard example using Spark Streaming and Node.js
☆26Dec 17, 2025Updated 3 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
fhussonnois / storm-trident-elasticsearch
View on GitHub
Trident State implementation on top of Elasticsearch
☆21May 18, 2015Updated 10 years ago
NextMark / datashops
View on GitHub
A distributed data factory, providing data access, etl, scheduling. Easily manage tasks such as hive, spark, clickhouse, flink, shell, py…
☆33May 21, 2022Updated 3 years ago
JerryLead / SparkProfiler
View on GitHub
Profiling Spark Applications for Performance Comparison and Diagnosis
☆17Nov 11, 2018Updated 7 years ago
Kent7306 / akkaflow
View on GitHub
akkaflow是一个基于akka架构上构建的分布式高可用DAG工作流调度工具，可以把子节点分配在集群机器上并行执行，高效利用集群资源。
☆107Sep 14, 2019Updated 6 years ago
levelfour / pumil
View on GitHub
Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags
☆10Apr 28, 2018Updated 7 years ago
cpbaranwal / Avro-SparkStreaming-Kafka
View on GitHub
Code for processing AVRO data in Spark Streaming + Kafka (DirectKafka approach with custom offset management in zookeeper)
☆29Sep 9, 2016Updated 9 years ago
sshahriyar / totalads
View on GitHub
Total Anomaly Detection System for software logs and traces
☆10Dec 7, 2015Updated 10 years ago