yaooqinn / itachiView external linksLinks
A library that brings useful functions from various modern database management systems to Apache Spark
☆61Sep 4, 2023Updated 2 years ago
Alternatives and similar repositories for itachi
Users that are interested in itachi are comparing it to the libraries listed below
Sorting:
- Filling in the Spark function gaps across APIs☆50Apr 14, 2021Updated 4 years ago
- Pandas helper functions☆31Feb 19, 2023Updated 2 years ago
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆186Oct 15, 2025Updated 3 months ago
- Essential Spark extensions and helper methods ✨😲☆766Sep 14, 2025Updated 4 months ago
- Alerting and monitoring tool for Apache Spark☆23May 20, 2022Updated 3 years ago
- Delta lake and filesystem helper methods☆50Feb 29, 2024Updated last year
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Nov 4, 2024Updated last year
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆94May 9, 2025Updated 9 months ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆29May 15, 2020Updated 5 years ago
- On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.☆35Apr 15, 2025Updated 9 months ago
- Paper: A Zero-rename committer for object stores☆20Nov 7, 2025Updated 3 months ago
- A library that provides useful extensions to Apache Spark and PySpark.☆232Jan 20, 2026Updated 3 weeks ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Sep 6, 2024Updated last year
- Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.☆37Jan 3, 2023Updated 3 years ago
- PostgreSQL and GreenPlum Data Source for Apache Spark☆35Jul 9, 2025Updated 7 months ago
- Spark ClickHouse Connector build on DataSourceV2 API☆211Feb 1, 2026Updated last week
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆432Jan 14, 2022Updated 4 years ago
- HDFS based on Java implementation as a remote ObjectStore for DataFusion☆10Feb 13, 2024Updated 2 years ago
- HiveQL Jupyter Kernel☆10Aug 5, 2022Updated 3 years ago
- A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets…☆14Updated this week
- Movie Recommendation System Using Spark ML, Akka and Cassandra☆12Oct 4, 2019Updated 6 years ago
- An example of SparkConnect extension.☆15Mar 5, 2024Updated last year
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆347May 31, 2024Updated last year
- Better bridge apache spark and postgresql☆23Sep 11, 2023Updated 2 years ago
- A Spark plugin for reading and writing Excel files☆520Feb 4, 2026Updated last week
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!☆235Jan 24, 2025Updated last year
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 4 years ago
- A Spark datasource for the HadoopCryptoLedger library☆13Sep 29, 2025Updated 4 months ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Apr 21, 2023Updated 2 years ago
- Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote pe…☆14Sep 18, 2023Updated 2 years ago
- Usage examples for byte-genie API☆12Apr 27, 2024Updated last year
- ☆19Mar 15, 2024Updated last year
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Dec 31, 2024Updated last year
- Write property based tests easily on spark dataframes☆20Jan 19, 2024Updated 2 years ago
- Mirror of Apache DataFu☆121May 20, 2025Updated 8 months ago
- Delta Lake helper methods in PySpark☆327Jan 19, 2026Updated 3 weeks ago
- Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.☆18Mar 27, 2024Updated last year
- A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…☆182Apr 6, 2022Updated 3 years ago
- SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.☆136Mar 6, 2023Updated 2 years ago