Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
☆106Jan 22, 2024Updated 2 years ago
Alternatives and similar repositories for chombo
Users that are interested in chombo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Aug 17, 2022Updated 3 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆40Jun 29, 2017Updated 8 years ago
- Apache Spark based ETL Engine☆71Oct 18, 2016Updated 9 years ago
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- A pyspark lib to validate data quality☆19Nov 11, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆21Oct 1, 2015Updated 10 years ago
- Apache Amaterasu☆56Oct 18, 2019Updated 6 years ago
- ☆25Oct 12, 2016Updated 9 years ago
- Open source task scheduler with dependency management☆15Jul 1, 2018Updated 7 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 10 months ago
- Spark pipelines that correspond to a series of Dataflow examples.☆27May 5, 2019Updated 7 years ago
- Spark Structured Streaming JDBC Sink☆16Apr 26, 2021Updated 5 years ago
- Repository of Notebooks taken from https://neo4j.com/graph-algorithms-book/☆26Feb 21, 2020Updated 6 years ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆28May 15, 2020Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Real time and offline time series analysis with Spark, Spark Streaming and Storm☆21Oct 20, 2020Updated 5 years ago
- An example of how to use the JDBC to issue Hive queries from a Java client application.☆11Apr 5, 2018Updated 8 years ago
- ☆32Mar 21, 2018Updated 8 years ago
- Example API Access SmartApp that shows the state and allows control of devices☆12Mar 11, 2026Updated 2 months ago
- Terraform provider for interacting with NiFi cluster☆51May 29, 2019Updated 6 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆162Oct 4, 2022Updated 3 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆125May 12, 2026Updated 2 weeks ago
- Apache Spark ETL Utilities☆39Oct 23, 2024Updated last year
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆63Sep 6, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Machine Learning Stack for Big Data, Big Cluster and Big Challenges☆22Sep 6, 2018Updated 7 years ago
- Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled wit…☆19Feb 20, 2011Updated 15 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆74Mar 14, 2021Updated 5 years ago
- Distributed SQL query engine for running interactive analytic queries against big data sources.☆10Jul 1, 2016Updated 9 years ago
- 优化flink的多流操作(例如join),优化点不限于数据丢失问题,以及性能问题☆11Apr 8, 2019Updated 7 years ago
- ☆16Jun 27, 2020Updated 5 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Sep 8, 2022Updated 3 years ago
- Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...☆19Dec 7, 2017Updated 8 years ago
- Spark package for checking data quality☆221Feb 28, 2020Updated 6 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Semiautomatic annotation editor for rich html editors.☆60Apr 30, 2013Updated 13 years ago
- Implementation of a Big Data (batch and stream) distributed processing engine in Java using Akka actors.☆12Feb 20, 2023Updated 3 years ago
- A fork of cascading patterns, but implemented for trident☆72Dec 16, 2023Updated 2 years ago
- Scala API for distributed closures on Apache Ignite☆11Jun 6, 2015Updated 10 years ago
- Drools processor for Apache NiFi☆39Oct 23, 2019Updated 6 years ago
- THIS REPOSITORY IS DEPRECATED☆19Jul 6, 2023Updated 2 years ago
- Cloud based Data Platform based on Apache Spark☆28Updated this week