PySpark test helper methods with beautiful error messages
β765May 20, 2026Updated this week
Alternatives and similar repositories for chispa
Users that are interested in chispa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- pyspark methods to enhance developer productivity π£ π― πβ687Mar 6, 2025Updated last year
- Delta Lake helper methods in PySparkβ329Jan 19, 2026Updated 4 months ago
- A Python Library to support running data quality rules while the spark job is runningβ‘β202Apr 27, 2026Updated 3 weeks ago
- Spark style guideβ270Sep 30, 2024Updated last year
- This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurringβ¦β1,246Sep 8, 2025Updated 8 months ago
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflowβ227Apr 20, 2026Updated last month
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)β455Apr 2, 2026Updated last month
- A library that provides useful extensions to Apache Spark and PySpark.β238Mar 18, 2026Updated 2 months ago
- Delta Lake helper methods. No Spark dependency.β22Jan 19, 2026Updated 4 months ago
- Testing framework for Databricks notebooksβ315Apr 20, 2024Updated 2 years ago
- Delta lake and filesystem helper methodsβ50Feb 29, 2024Updated 2 years ago
- Python API for Deequβ820May 9, 2026Updated 2 weeks ago
- Delta Lake examplesβ239Oct 8, 2024Updated last year
- Fake Pandas / PySpark DataFrame creatorβ48Mar 10, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,617May 14, 2026Updated last week
- Essential Spark extensions and helper methods β¨π²β767Sep 14, 2025Updated 8 months ago
- This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simpβ¦β823Updated this week
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trβ¦β8,809May 16, 2026Updated last week
- Data Contracts engine for the modern data stack. https://www.soda.ioβ2,350Updated this week
- Open, Multi-modal Catalog for Data & AIβ3,397May 14, 2026Updated last week
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.β21Jun 12, 2024Updated last year
- A native Rust library for Delta Lake, with bindings into Pythonβ3,220Updated this week
- pytest plugin to run the tests with support of pysparkβ88May 21, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- β¨ A Pydantic to PySpark schema libraryβ124Updated this week
- Demo of using the Nutter for testing of Databricks notebooks in the CI/CD pipelineβ152Aug 14, 2024Updated last year
- Base classes to use when writing tests with Sparkβ1,554Apr 20, 2026Updated last month
- Always know what to expect from your data.β11,507May 14, 2026Updated last week
- PyJaws: A Pythonic Way to Define Databricks Jobs and Workflowsβ45Jan 24, 2026Updated 3 months ago
- Marshmallow serializer integration with pysparkβ12Dec 29, 2023Updated 2 years ago
- Code samples, etc. for Databricksβ74Feb 11, 2026Updated 3 months ago
- Scalable and efficient data transformation framework - backwards compatible with dbt.β3,080Apr 29, 2026Updated 3 weeks ago
- PySpark schema generatorβ44Feb 23, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- An Open Standard for lineage metadata collectionβ2,470Updated this week
- Implementing best practices for PySpark ETL jobs and applications.β2,102Jan 1, 2023Updated 3 years ago
- Column-wise type annotations for pyspark DataFramesβ104Updated this week
- Yet Another (Spark) ETL Frameworkβ21Oct 21, 2023Updated 2 years ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for severβ¦β288Mar 4, 2026Updated 2 months ago
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shouβ¦β10Jul 31, 2023Updated 2 years ago
- Drop-in replacement for Apache Spark UIβ456Updated this week