PySpark test helper methods with beautiful error messages
β759Apr 8, 2026Updated this week
Alternatives and similar repositories for chispa
Users that are interested in chispa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- pyspark methods to enhance developer productivity π£ π― πβ687Mar 6, 2025Updated last year
- Delta Lake helper methods in PySparkβ328Jan 19, 2026Updated 2 months ago
- A Python Library to support running data quality rules while the spark job is runningβ‘β201Updated this week
- Spark style guideβ272Sep 30, 2024Updated last year
- This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurringβ¦β1,240Sep 8, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflowβ227Mar 30, 2026Updated last week
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)β455Apr 2, 2026Updated last week
- A library that provides useful extensions to Apache Spark and PySpark.β235Mar 18, 2026Updated 3 weeks ago
- Delta Lake helper methods. No Spark dependency.β22Jan 19, 2026Updated 2 months ago
- Testing framework for Databricks notebooksβ315Apr 20, 2024Updated last year
- Delta lake and filesystem helper methodsβ50Feb 29, 2024Updated 2 years ago
- Python API for Deequβ815Mar 9, 2026Updated last month
- Delta Lake examplesβ239Oct 8, 2024Updated last year
- Fake Pandas / PySpark DataFrame creatorβ48Mar 10, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,603Apr 1, 2026Updated last week
- Essential Spark extensions and helper methods β¨π²β766Sep 14, 2025Updated 6 months ago
- This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simpβ¦β820Apr 1, 2026Updated last week
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trβ¦β8,746Updated this week
- Data Contracts engine for the modern data stack. https://www.soda.ioβ2,326Updated this week
- Open, Multi-modal Catalog for Data & AIβ3,354Updated this week
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.β21Jun 12, 2024Updated last year
- A native Rust library for Delta Lake, with bindings into Pythonβ3,184Updated this week
- pytest plugin to run the tests with support of pysparkβ88May 21, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- β¨ A Pydantic to PySpark schema libraryβ123Updated this week
- Demo of using the Nutter for testing of Databricks notebooks in the CI/CD pipelineβ152Aug 14, 2024Updated last year
- Always know what to expect from your data.β11,391Updated this week
- Base classes to use when writing tests with Sparkβ1,551Updated this week
- PyJaws: A Pythonic Way to Define Databricks Jobs and Workflowsβ45Jan 24, 2026Updated 2 months ago
- Marshmallow serializer integration with pysparkβ12Dec 29, 2023Updated 2 years ago
- Code samples, etc. for Databricksβ73Feb 11, 2026Updated 2 months ago
- Scalable and efficient data transformation framework - backwards compatible with dbt.β3,022Apr 3, 2026Updated last week
- PySpark schema generatorβ44Feb 23, 2023Updated 3 years ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An Open Standard for lineage metadata collectionβ2,396Updated this week
- Implementing best practices for PySpark ETL jobs and applications.β2,094Jan 1, 2023Updated 3 years ago
- Column-wise type annotations for pyspark DataFramesβ99Updated this week
- Yet Another (Spark) ETL Frameworkβ21Oct 21, 2023Updated 2 years ago
- Drop-in replacement for Apache Spark UIβ438Updated this week
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for severβ¦β286Mar 4, 2026Updated last month
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shouβ¦β10Jul 31, 2023Updated 2 years ago