ACID Data Source for Apache Spark based on Hive ACID
☆96Jul 7, 2021Updated 4 years ago
Alternatives and similar repositories for spark-acid
Users that are interested in spark-acid are comparing it to the libraries listed below
Sorting:
- Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines☆17Jan 21, 2020Updated 6 years ago
- On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.☆35Apr 15, 2025Updated 10 months ago
- PostgreSQL and GreenPlum Data Source for Apache Spark☆35Jul 9, 2025Updated 7 months ago
- Rocksdb state storage implementation for Structured Streaming.☆17Oct 21, 2020Updated 5 years ago
- Prescriptive Applications over Kite and Hadoop☆12Oct 14, 2015Updated 10 years ago
- A curated list of awesome PrestoDB / Trino software, libraries, tools and resources☆18Jun 28, 2021Updated 4 years ago
- A Spark datasource for the HadoopOffice library☆36Sep 29, 2025Updated 5 months ago
- Delta Lake Examples☆11Apr 24, 2020Updated 5 years ago
- ☆30Oct 15, 2019Updated 6 years ago
- Embed any webapp/website as Ambari view!☆25Feb 26, 2016Updated 10 years ago
- Convert a CSV fle to ORCFile☆26Apr 10, 2019Updated 6 years ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆284Feb 24, 2026Updated last week
- ☆103Mar 23, 2020Updated 5 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Oct 11, 2021Updated 4 years ago
- Custom state store providers for Apache Spark☆92Feb 14, 2025Updated last year
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆94May 9, 2025Updated 9 months ago
- Apache Zeppelin Service for Apache Ambari Service. Installation and management of Zeppelin via Ambari.☆14Jan 23, 2016Updated 10 years ago
- Kubernetes deployment of PrestoDB, Hive Metastore, and Minio S3-standard object store☆17Oct 20, 2022Updated 3 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆587Jan 24, 2024Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆92Mar 5, 2024Updated 2 years ago
- Qubole Sparklens tool for performance tuning Apache Spark☆590Jun 26, 2024Updated last year
- Cache File System optimized for columnar formats and object stores☆187Aug 11, 2022Updated 3 years ago
- Spark Structured Streaming State Tools☆34Jul 3, 2020Updated 5 years ago
- Ansible playbook for automated HDP 2.x deployment install with Kerberos☆19Sep 8, 2016Updated 9 years ago
- Spark cloud integration: tests, cloud committers and more☆20Jan 30, 2025Updated last year
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆130Dec 19, 2024Updated last year
- Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.☆889Feb 9, 2026Updated 3 weeks ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆122Updated this week
- HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)☆62Sep 29, 2025Updated 5 months ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Apr 23, 2019Updated 6 years ago
- StreamLine - Streaming Analytics☆167Aug 27, 2023Updated 2 years ago
- ☆63Nov 8, 2019Updated 6 years ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆431Jan 14, 2022Updated 4 years ago
- ☆202Feb 18, 2026Updated 2 weeks ago
- hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE☆297Jan 2, 2023Updated 3 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 7 months ago
- Discover Flink clusters on Hadoop YARN for Prometheus☆23Aug 5, 2020Updated 5 years ago
- Random implementation notes☆33Apr 23, 2013Updated 12 years ago
- Apache HBase Connectors☆248Feb 13, 2026Updated 2 weeks ago