pranab / chomboLinks
Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
☆102Updated last year
Alternatives and similar repositories for chombo
Users that are interested in chombo are comparing it to the libraries listed below
Sorting:
- spark + drools☆102Updated 3 years ago
- Build configuration-driven ETL pipelines on Apache Spark☆159Updated 2 years ago
- Helpful user defined fuctions / table generating functions for Hive☆101Updated 9 years ago
- Spark structured streaming with Kafka data source and writing to Cassandra☆62Updated 5 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆41Updated 7 years ago
- StreamLine - Streaming Analytics☆164Updated last year
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 6 years ago
- Code from an Apache Flink™ talk I regularly give☆44Updated 6 years ago
- An example Apache Beam project.☆111Updated 8 years ago
- Quark is a data virtualization engine over analytic databases.☆98Updated 7 years ago
- High performance HBase / Spark SQL engine☆28Updated 2 years ago
- Based off the design of SparkOnHBase. This Repo will support Spark, Spark Streaming, and Spark SQL integration with Kudu.☆50Updated 9 years ago
- Apache Spark based ETL Engine☆71Updated 8 years ago
- ☆49Updated 5 years ago
- Remedy small files by combining them into larger ones.☆193Updated 2 years ago
- Ambari service for Apache Zeppelin notebook☆71Updated 7 years ago
- Demos around Ambari Views, Services, Blueprints☆63Updated 9 years ago
- Spark SQL index for Parquet tables☆134Updated 4 years ago
- Mirror of Apache Atlas (Incubating)☆94Updated 2 years ago
- ☆105Updated 5 years ago
- A Real-Time Analytical Processing (RTAP) example using Spark/Shark☆51Updated 11 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- Apache Spark and Apache Kafka integration example☆124Updated 7 years ago
- Hadoop MapReduce tool to convert Avro data files to Parquet format.☆34Updated 12 years ago
- Star Schema Benchmark using the Hive / Druid Integration☆30Updated 7 years ago
- DataQuality for BigData☆144Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- Spark Streaming HBase Example☆96Updated 9 years ago
- Spark 3.0.0 Structured Streaming Kafka Avro Demo☆15Updated 2 years ago
- ElasticSearch integration for Apache Spark☆47Updated 9 years ago