Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
☆106Jan 22, 2024Updated 2 years ago
Alternatives and similar repositories for chombo
Users that are interested in chombo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Aug 17, 2022Updated 3 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆40Jun 29, 2017Updated 8 years ago
- Apache Spark based ETL Engine☆71Oct 18, 2016Updated 9 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30May 13, 2026Updated last month
- A pyspark lib to validate data quality☆19Nov 11, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆21Oct 1, 2015Updated 10 years ago
- Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka☆25Oct 16, 2020Updated 5 years ago
- Apache Amaterasu☆56Oct 18, 2019Updated 6 years ago
- ☆25Oct 12, 2016Updated 9 years ago
- My branch of Apache Flume with a generic JDBC sink (not yet licensed to Apache)☆11Feb 12, 2022Updated 4 years ago
- flinksql-platform☆19Mar 22, 2021Updated 5 years ago
- Plot live-stats as graph from ApacheSpark application using Lightning-viz☆18Jul 3, 2017Updated 8 years ago
- Open source task scheduler with dependency management☆15Jul 1, 2018Updated 7 years ago
- Spark pipelines that correspond to a series of Dataflow examples.☆27May 5, 2019Updated 7 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Spark Structured Streaming JDBC Sink☆16Apr 26, 2021Updated 5 years ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆28May 15, 2020Updated 6 years ago
- An example of how to use the JDBC to issue Hive queries from a Java client application.☆11Apr 5, 2018Updated 8 years ago
- Example API Access SmartApp that shows the state and allows control of devices☆12Mar 11, 2026Updated 3 months ago
- docs, codes and resources to prepare for the CRT020: Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3 certific…☆10Sep 25, 2019Updated 6 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆162Oct 4, 2022Updated 3 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆126Updated this week
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆63Sep 6, 2024Updated last year
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Distributed SQL query engine for running interactive analytic queries against big data sources.☆10Jul 1, 2016Updated 9 years ago
- 优化flink的多流操作(例如join),优化点不限于数据丢失问题,以及性能问题☆11Apr 8, 2019Updated 7 years ago
- ☆16Jun 27, 2020Updated 5 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Sep 8, 2022Updated 3 years ago
- Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...☆19Dec 7, 2017Updated 8 years ago
- Implementation of a Big Data (batch and stream) distributed processing engine in Java using Akka actors.☆12Feb 20, 2023Updated 3 years ago
- Scala API for distributed closures on Apache Ignite☆11Jun 6, 2015Updated 11 years ago
- 使用spark + kudu的案例☆15Sep 13, 2017Updated 8 years ago
- THIS REPOSITORY IS DEPRECATED☆19Jul 6, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Cloud based Data Platform based on Apache Spark☆28May 21, 2026Updated 3 weeks ago
- Archive and analyze results from a Twitter search (**no longer maintained**)☆33Mar 25, 2015Updated 11 years ago
- Terminal kanban on plain markdown, with optional Today/Tomorrow planner, 24h agenda + calendar, and a live Claude Code agent view. Use on…☆78Jun 6, 2026Updated last week
- Solutions of LeetCode interview questions☆15Feb 7, 2019Updated 7 years ago
- Specification Edition Support Tool☆13Mar 3, 2015Updated 11 years ago
- An Ansible collection of utilities and other resources for Cloudera Platform deployments☆13Updated this week
- Teiid Designer is a visual tool that enables rapid, model-driven definition, integration, management and testing of data services without…☆35Dec 13, 2022Updated 3 years ago