PySpark test helper methods with beautiful error messages
β770May 20, 2026Updated last month
Alternatives and similar repositories for chispa
Users that are interested in chispa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- pyspark methods to enhance developer productivity π£ π― πβ686Jun 9, 2026Updated 3 weeks ago
- Delta Lake helper methods in PySparkβ329Jan 19, 2026Updated 5 months ago
- A Python Library to support running data quality rules while the spark job is runningβ‘β201Updated this week
- Spark style guideβ270Sep 30, 2024Updated last year
- This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurringβ¦β1,249Sep 8, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflowβ227Updated this week
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)β458Apr 2, 2026Updated 3 months ago
- A library that provides useful extensions to Apache Spark and PySpark.β238Jun 5, 2026Updated 3 weeks ago
- Delta Lake helper methods. No Spark dependency.β22Jan 19, 2026Updated 5 months ago
- Testing framework for Databricks notebooksβ315Apr 20, 2024Updated 2 years ago
- Delta lake and filesystem helper methodsβ51Feb 29, 2024Updated 2 years ago
- Python API for Deequβ822Jun 11, 2026Updated 3 weeks ago
- Delta Lake examplesβ238Oct 8, 2024Updated last year
- Fake Pandas / PySpark DataFrame creatorβ48Mar 10, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,625Jun 25, 2026Updated last week
- Essential Spark extensions and helper methods β¨π²β767Jun 22, 2026Updated last week
- This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simpβ¦β826May 19, 2026Updated last month
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trβ¦β8,882Updated this week
- Data Contracts engine for the modern data stack. https://www.soda.ioβ2,379Updated this week
- Open, Multi-modal Catalog for Data & AIβ3,436Jun 17, 2026Updated 2 weeks ago
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.β21Jun 12, 2024Updated 2 years ago
- A native Rust library for Delta Lake, with bindings into Pythonβ3,248Updated this week
- β¨ A Pydantic to PySpark schema libraryβ127Jun 23, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Demo of using the Nutter for testing of Databricks notebooks in the CI/CD pipelineβ152Aug 14, 2024Updated last year
- Base classes to use when writing tests with Sparkβ1,551Apr 20, 2026Updated 2 months ago
- Always know what to expect from your data.β11,603Updated this week
- PyJaws: A Pythonic Way to Define Databricks Jobs and Workflowsβ45Jan 24, 2026Updated 5 months ago
- Marshmallow serializer integration with pysparkβ12Dec 29, 2023Updated 2 years ago
- Code samples, etc. for Databricksβ74Feb 11, 2026Updated 4 months ago
- Scalable and efficient data transformation framework - backwards compatible with dbt.β3,159Updated this week
- PySpark schema generatorβ44Feb 23, 2023Updated 3 years ago
- An Open Standard for lineage metadata collectionβ2,517Jun 25, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Implementing best practices for PySpark ETL jobs and applications.β2,111Jan 1, 2023Updated 3 years ago
- Column-wise type annotations for pyspark DataFramesβ107Updated this week
- Yet Another (Spark) ETL Frameworkβ21Oct 21, 2023Updated 2 years ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for severβ¦β289Jun 3, 2026Updated last month
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shouβ¦β10Jul 31, 2023Updated 2 years ago
- Drop-in replacement for Apache Spark UIβ472Jun 2, 2026Updated last month
- A flake8 plugin that detects of usage withColumn in a loop or inside reduceβ28Jun 20, 2025Updated last year