Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
☆103Jan 22, 2024Updated 2 years ago
Alternatives and similar repositories for chombo
Users that are interested in chombo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Aug 17, 2022Updated 3 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆40Jun 29, 2017Updated 8 years ago
- Apache Spark based ETL Engine☆71Oct 18, 2016Updated 9 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A pyspark lib to validate data quality☆18Nov 11, 2022Updated 3 years ago
- ☆21Oct 1, 2015Updated 10 years ago
- ☆11Apr 10, 2014Updated 11 years ago
- Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka☆24Oct 16, 2020Updated 5 years ago
- Apache Amaterasu☆56Oct 18, 2019Updated 6 years ago
- ☆25Oct 12, 2016Updated 9 years ago
- flinksql-platform☆19Mar 22, 2021Updated 5 years ago
- Spark pipelines that correspond to a series of Dataflow examples.☆27May 5, 2019Updated 6 years ago
- Spark Structured Streaming JDBC Sink☆16Apr 26, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Repository of Notebooks taken from https://neo4j.com/graph-algorithms-book/☆26Feb 21, 2020Updated 6 years ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆29May 15, 2020Updated 5 years ago
- Real time and offline time series analysis with Spark, Spark Streaming and Storm☆21Oct 20, 2020Updated 5 years ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆76Apr 24, 2024Updated last year
- An example of how to use the JDBC to issue Hive queries from a Java client application.☆11Apr 5, 2018Updated 7 years ago
- docs, codes and resources to prepare for the CRT020: Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3 certific…☆10Sep 25, 2019Updated 6 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆161Oct 4, 2022Updated 3 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Sep 6, 2024Updated last year
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled wit…☆18Feb 20, 2011Updated 15 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 5 years ago
- Distributed SQL query engine for running interactive analytic queries against big data sources.☆10Jul 1, 2016Updated 9 years ago
- ☆16Jun 27, 2020Updated 5 years ago
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Sep 8, 2022Updated 3 years ago
- Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...☆19Dec 7, 2017Updated 8 years ago
- Spatial search using Elastic Search