Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
☆105Jan 22, 2024Updated 2 years ago
Alternatives and similar repositories for chombo
Users that are interested in chombo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Aug 17, 2022Updated 3 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆40Jun 29, 2017Updated 8 years ago
- Apache Spark based ETL Engine☆71Oct 18, 2016Updated 9 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka☆25Oct 16, 2020Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Apache Amaterasu☆56Oct 18, 2019Updated 6 years ago
- ☆25Oct 12, 2016Updated 9 years ago
- My branch of Apache Flume with a generic JDBC sink (not yet licensed to Apache)☆11Feb 12, 2022Updated 4 years ago
- flinksql-platform☆19Mar 22, 2021Updated 5 years ago
- Plot live-stats as graph from ApacheSpark application using Lightning-viz☆18Jul 3, 2017Updated 8 years ago
- Open source task scheduler with dependency management☆15Jul 1, 2018Updated 7 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 8 months ago
- Spark pipelines that correspond to a series of Dataflow examples.☆27May 5, 2019Updated 6 years ago
- Spark Structured Streaming JDBC Sink☆16Apr 26, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Repository of Notebooks taken from https://neo4j.com/graph-algorithms-book/☆26Feb 21, 2020Updated 6 years ago
- Real time and offline time series analysis with Spark, Spark Streaming and Storm☆21Oct 20, 2020Updated 5 years ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Apr 24, 2024Updated last year
- An example of how to use the JDBC to issue Hive queries from a Java client application.☆11Apr 5, 2018Updated 8 years ago
- docs, codes and resources to prepare for the CRT020: Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3 certific…☆10Sep 25, 2019Updated 6 years ago
- Terraform provider for interacting with NiFi cluster☆51May 29, 2019Updated 6 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆162Oct 4, 2022Updated 3 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆123Updated this week
- Apache Spark ETL Utilities☆39Oct 23, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Distributed optimization framework with parameter server☆23Jun 14, 2015Updated 10 years ago
- Machine Learning Stack for Big Data, Big Cluster and Big Challenges☆22Sep 6, 2018Updated 7 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 5 years ago
- Distributed SQL query engine for running interactive analytic queries against big data sources.☆10Jul 1, 2016Updated 9 years ago
- 优化flink的多流操作(例如join),优化点不限于数据丢失问题,以及性能问题☆11Apr 8, 2019Updated 7 years ago
- ☆16Jun 27, 2020Updated 5 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Sep 8, 2022Updated 3 years ago
- Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...☆19Dec 7, 2017Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Implementation of a Big Data (batch and stream) distributed processing engine in Java using Akka actors.☆12Feb 20, 2023Updated 3 years ago
- A fork of cascading patterns, but implemented for trident☆71Dec 16, 2023Updated 2 years ago
- Scala API for distributed closures on Apache Ignite☆11Jun 6, 2015Updated 10 years ago
- Drools processor for Apache NiFi☆39Oct 23, 2019Updated 6 years ago
- Data ingestion examples☆11Feb 12, 2015Updated 11 years ago
- Java event logs collector for hadoop and frameworks☆41Mar 25, 2025Updated last year
- THIS REPOSITORY IS DEPRECATED☆19Jul 6, 2023Updated 2 years ago