Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
☆104Jan 22, 2024Updated 2 years ago
Alternatives and similar repositories for chombo
Users that are interested in chombo are comparing it to the libraries listed below
Sorting:
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Apache Spark based ETL Engine☆71Oct 18, 2016Updated 9 years ago
- A pyspark lib to validate data quality☆18Nov 11, 2022Updated 3 years ago
- Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka☆25Oct 16, 2020Updated 5 years ago
- Apache Amaterasu☆56Oct 18, 2019Updated 6 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆40Jun 29, 2017Updated 8 years ago
- Implementation of a Big Data (batch and stream) distributed processing engine in Java using Akka actors.☆12Feb 20, 2023Updated 3 years ago
- Scala API for distributed closures on Apache Ignite☆11Jun 6, 2015Updated 10 years ago
- Open source task scheduler with dependency management☆15Jul 1, 2018Updated 7 years ago
- Showing the relationship between ImageNet ID and labels and pytorch pre-trained model output ID and labels☆10Oct 11, 2020Updated 5 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆161Oct 4, 2022Updated 3 years ago
- Transporter for integrating OpenLineage with OpenMetadata☆16Sep 10, 2025Updated 5 months ago
- Hadoop, Spark and Storm based anomaly detection implementations for data quality, cyber security, fraud detection etc.☆129Jan 22, 2024Updated 2 years ago
- Hive-JDBC-Proxy是一个高性能的HiveServer2和Spark ThriftServer的代理服务,具 备负载均衡、基于规则转发Hive JDBC Client的请求给到HiveServer2和Spark ThriftServer的能力。☆33Apr 12, 2022Updated 3 years ago
- Set of Hadoop, Spark and Storm based tools for web and customer analytic☆34Jun 7, 2021Updated 4 years ago
- Distributed SQL query engine for running interactive analytic queries against big data sources.☆10Jul 1, 2016Updated 9 years ago
- A Chef cookbook to install the Confluent Platform☆15Jul 25, 2017Updated 8 years ago
- 优化flink的多流操作(例如join),优化点不限于数据丢失问题,以及性能问题☆11Apr 8, 2019Updated 6 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Sep 8, 2022Updated 3 years ago
- Java event logs collector for hadoop and frameworks☆41Mar 25, 2025Updated 11 months ago
- Plot live-stats as graph from ApacheSpark application using Lightning-viz☆18Jul 3, 2017Updated 8 years ago
- ☆36Jul 13, 2023Updated 2 years ago
- Apache Spark ETL Utilities☆39Oct 23, 2024Updated last year
- Teiid Designer is a visual tool that enables rapid, model-driven definition, integration, management and testing of data services without…☆34Dec 13, 2022Updated 3 years ago
- 使用spark + kudu的案例☆15Sep 13, 2017Updated 8 years ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Apr 24, 2024Updated last year
- Spark Structured Streaming JDBC Sink☆16Apr 26, 2021Updated 4 years ago
- spark streaming从kafka读取消息,offset写入Redis,spark计算单词出现频率,最后写入hive表☆17Jul 30, 2019Updated 6 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 4 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 7 months ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆122Feb 24, 2026Updated last week
- flinksql-platform☆19Mar 22, 2021Updated 4 years ago
- Capture the logical plan from Spark (SQL)☆22Mar 6, 2021Updated 4 years ago
- Real time and offline time series analysis with Spark, Spark Streaming and Storm☆21Oct 20, 2020Updated 5 years ago
- Live Training by Packt, with Jeffrey Yau.☆19Jan 30, 2023Updated 3 years ago
- ☆50Feb 11, 2020Updated 6 years ago
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆128Sep 7, 2018Updated 7 years ago
- Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream …☆22Feb 6, 2017Updated 9 years ago
- Ansible playbook for automated HDP 2.x deployment install with Kerberos☆19Sep 8, 2016Updated 9 years ago