yaooqinn/itachi

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yaooqinn/itachi)

yaooqinn / itachi

A library that brings useful functions from various modern database management systems to Apache Spark

☆63

Alternatives and similar repositories for itachi

Users that are interested in itachi are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MrPowers / bebe
View on GitHub
Filling in the Spark function gaps across APIs
☆50Apr 14, 2021Updated 5 years ago
MrPowers / beavis
View on GitHub
Pandas helper functions
☆31Feb 19, 2023Updated 3 years ago
MrPowers / spark-slack
View on GitHub
Speak Slack notifications and process Slack slash commands
☆15Dec 20, 2018Updated 7 years ago
swoop-inc / spark-alchemy
View on GitHub
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
☆191Oct 15, 2025Updated 9 months ago
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
G-Research / spark-extension
View on GitHub
A library that provides useful extensions to Apache Spark and PySpark.
☆239Jul 1, 2026Updated 3 weeks ago
cerndb / SparkPlugins
View on GitHub
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆96May 11, 2026Updated 2 months ago
aphp / HiveQLKernel
View on GitHub
HiveQL Jupyter Kernel
☆10Aug 5, 2022Updated 3 years ago
SemyonSinchenko / spark-connect-example
View on GitHub
An example of SparkConnect extension.
☆15Mar 5, 2024Updated 2 years ago
wankunde / sql-runner
View on GitHub
☆17Mar 19, 2024Updated 2 years ago
yaooqinn / spark-history-cli
View on GitHub
CLI tool for querying Apache Spark History Server REST API
☆28Mar 22, 2026Updated 4 months ago
NetEase / spark-alarm
View on GitHub
Alerting and monitoring tool for Apache Spark
☆23May 20, 2022Updated 4 years ago
oracle / spark-oracle
View on GitHub
On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.
☆36Apr 15, 2025Updated last year
data-tools / big-data-types
View on GitHub
A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets…
☆14Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ClickHouse / spark-clickhouse-connector
View on GitHub
Spark ClickHouse Connector build on DataSourceV2 API
☆217Updated this week
databrickslabs / delta-oms
View on GitHub
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics f…
☆42Nov 27, 2023Updated 2 years ago
Spratiher9 / JumpSpark
View on GitHub
JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.
☆10May 12, 2023Updated 3 years ago
microsoft / hyperspace
View on GitHub
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
☆430Jan 14, 2022Updated 4 years ago
maropu / spark-data-repair-plugin
View on GitHub
Provide functionality to build statistical models to repair dirty tabular data in Spark
☆12Apr 21, 2023Updated 3 years ago
yaooqinn / spark-postgres
View on GitHub
PostgreSQL and GreenPlum Data Source for Apache Spark
☆35May 6, 2026Updated 2 months ago
minio / spark-select
View on GitHub
A library for Spark DataFrame using MinIO Select API
☆102Sep 27, 2019Updated 6 years ago
swoop-inc / spark-records
View on GitHub
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Mar 14, 2021Updated 5 years ago
indix / sparkplug
View on GitHub
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
☆28May 15, 2020Updated 6 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
SemyonSinchenko / flake8-pyspark-with-column
View on GitHub
A flake8 plugin that detects of usage withColumn in a loop or inside reduce
☆28Jun 20, 2025Updated last year
awesome-kyuubi / hadoop-testing
View on GitHub
Testing Sandbox for Hadoop Ecosystem Components
☆45Jun 16, 2026Updated last month
apache / datafu
View on GitHub
Mirror of Apache DataFu
☆124Jul 9, 2026Updated 2 weeks ago
SaurabhChawla100 / spark-radiant
View on GitHub
Spark-Radiant is Apache Spark Performance and Cost Optimizer
☆25Dec 31, 2024Updated last year
Qbeast-io / qbeast-spark
View on GitHub
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
☆233Jan 24, 2025Updated last year
datamechanics / delight
View on GitHub
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
☆345May 31, 2024Updated 2 years ago
mrpowers-io / quinn
View on GitHub
pyspark methods to enhance developer productivity 📣 👯 🎉
☆687Jun 9, 2026Updated last month
netease-bigdata / ne-spark-courseware
View on GitHub
NetEase Spark Courses
☆15Sep 4, 2018Updated 7 years ago
thesquelched / spark-lineage
View on GitHub
Spark SQL listener to record lineage information
☆28Jan 24, 2021Updated 5 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
mrpowers-io / levi
View on GitHub
Delta Lake helper methods. No Spark dependency.
☆22Jan 19, 2026Updated 6 months ago
oap-project / oap-tools
View on GitHub
Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.
☆18Mar 27, 2024Updated 2 years ago
datafusion-contrib / datafusion-objectstore-hdfs
View on GitHub
HDFS based on Java implementation as a remote ObjectStore for DataFusion
☆10Feb 13, 2024Updated 2 years ago
elastacloud / spark-excel
View on GitHub
A Spark data source for reading Microsoft Excel files
☆13Jul 1, 2024Updated 2 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago
maropu / spark-sql-server
View on GitHub
Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol
☆34Sep 8, 2022Updated 3 years ago
MrPowers / mack
View on GitHub
Delta Lake helper methods in PySpark
☆328Jan 19, 2026Updated 6 months ago