Delta lake and filesystem helper methods
☆50Feb 29, 2024Updated 2 years ago
Alternatives and similar repositories for jodie
Users that are interested in jodie are comparing it to the libraries listed below
Sorting:
- Delta Lake helper methods. No Spark dependency.☆22Jan 19, 2026Updated last month
- Delta Acceptance Testing☆23Aug 25, 2025Updated 6 months ago
- ☆13Oct 4, 2023Updated 2 years ago
- Write property based tests easily on spark dataframes☆20Jan 19, 2024Updated 2 years ago
- Delta Lake helper methods in PySpark☆327Jan 19, 2026Updated last month
- A library that brings useful functions from various modern database management systems to Apache Spark☆61Sep 4, 2023Updated 2 years ago
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.☆21Jun 12, 2024Updated last year
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shou…☆10Jul 31, 2023Updated 2 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆683Mar 6, 2025Updated last year
- An example of SparkConnect extension.☆15Mar 5, 2024Updated 2 years ago
- SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.☆26Feb 22, 2025Updated last year
- Apache NiFi deployment on OpenShift☆13Jul 18, 2023Updated 2 years ago
- Powershell Scripts for Power BI☆13Sep 20, 2023Updated 2 years ago
- Delta Lake Documentation☆53Jun 19, 2024Updated last year
- Command line client for the Fugue API☆14Mar 7, 2023Updated 3 years ago
- A tool to generate PySpark schema from JSON.☆28Jan 21, 2024Updated 2 years ago
- PySpark test helper methods with beautiful error messages☆753Feb 25, 2026Updated last week
- Optics for Spark DataFrames☆47Mar 5, 2021Updated 5 years ago
- Spark Monitoring☆13Feb 28, 2023Updated 3 years ago
- Delta Lake examples☆240Oct 8, 2024Updated last year
- Code that was used as an example during the Data+AI Summit 2020☆15Mar 8, 2021Updated 4 years ago
- PySpark phonetic and string matching algorithms☆41Feb 19, 2024Updated 2 years ago
- PySpark schema generator☆44Feb 23, 2023Updated 3 years ago
- Writing PySpark logs in Apache Spark and Databricks☆17Jun 13, 2022Updated 3 years ago
- Type safety for spark columns☆79Oct 27, 2025Updated 4 months ago
- ☆19Jan 17, 2025Updated last year
- A Delta Lake reader for Dask☆53Jul 29, 2025Updated 7 months ago
- A Minimalistic Rust Implementation of Delta Sharing Server.☆98Mar 17, 2025Updated 11 months ago
- Example Power BI files☆18Sep 17, 2024Updated last year
- Filling in the Spark function gaps across APIs☆50Apr 14, 2021Updated 4 years ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆454Feb 8, 2026Updated last month
- An example PySpark project with pytest☆18Oct 13, 2017Updated 8 years ago
- Code Repository for Talk: Managing an Akka Cluster on Kubernetes☆23Jul 11, 2019Updated 6 years ago
- ☆24Dec 20, 2022Updated 3 years ago
- [under development] ETL materials to support proposal for CDM enhancements for clinical trial data☆24Jun 25, 2021Updated 4 years ago
- ☆59Jan 3, 2024Updated 2 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Sep 6, 2024Updated last year
- Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,…☆28May 19, 2025Updated 9 months ago
- Spark operator deployment and usage on OpenShift☆29Nov 25, 2024Updated last year