Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆63Mar 23, 2026Updated this week
Alternatives and similar repositories for soda-spark
Users that are interested in soda-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23Jun 14, 2021Updated 4 years ago
- Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.☆17Jan 29, 2026Updated 2 months ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,311Updated this week
- ☆15Updated this week
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆62Nov 30, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Apache Spark Scala utility to track data records during application execution☆11Jun 12, 2023Updated 2 years ago
- pytest plugin to run the tests with support of pyspark☆88May 21, 2025Updated 10 months ago
- ☆11Nov 26, 2024Updated last year
- ☆16Jun 11, 2020Updated 5 years ago
- Python API for Deequ☆813Mar 9, 2026Updated 3 weeks ago
- Easy way to define, execute and store quality rules for your data.☆18Dec 15, 2023Updated 2 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Sep 6, 2024Updated last year
- Python client for Taboola API☆15Apr 2, 2021Updated 4 years ago
- ☆17Oct 20, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Run GitHub GraphQL queries and mutations in VS Code☆13Apr 15, 2022Updated 3 years ago
- Source code for 'PySpark Recipes' by Raju Kumar Mishra☆26Nov 30, 2019Updated 6 years ago
- rb_status_plugin : Data confidence tool for Airflow☆12Jan 7, 2023Updated 3 years ago
- Getting Great Expectations setup to run on DataBricks with Spark Dataframes.☆13Jun 2, 2022Updated 3 years ago
- Delta Live Tables Workshop Resources☆17Feb 24, 2023Updated 3 years ago
- ☆30Apr 6, 2025Updated 11 months ago
- Basic Spark utilities☆13Feb 20, 2025Updated last year
- pyspark methods to enhance developer productivity 📣 👯 🎉☆685Mar 6, 2025Updated last year
- ☆38Mar 18, 2026Updated last week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A demo repo for some unusual and user-unfriendly behaviour with AWS environment variable encryption☆11Oct 30, 2021Updated 4 years ago
- Prefect integrations for interacting with Hightouch.☆11Feb 8, 2024Updated 2 years ago
- An idiomatic Scala wrapper around the AWS Java SDK☆22Dec 23, 2021Updated 4 years ago
- Package Angular and Spring Boot into a single JAR!☆12Feb 27, 2023Updated 3 years ago
- Instalador autonomo do Apache Spark para Sistemas linux: based(Debian,RHEL)☆13Dec 10, 2024Updated last year
- An Open Standard for lineage metadata collection☆2,375Updated this week
- SQL Server DBA Code and Helpful Scripts☆11Aug 16, 2012Updated 13 years ago
- jQuery autocomplete☆18Sep 1, 2015Updated 10 years ago
- Helm chart for deploying Apache Airflow in kubernetes☆19Aug 13, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt☆61Jan 5, 2026Updated 2 months ago
- Fast, zero-copy HTML Parser written in Rust☆25Dec 6, 2025Updated 3 months ago
- native Go library for Delta Lake☆10Jul 31, 2022Updated 3 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,596Mar 21, 2026Updated last week
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- ☆10Jan 28, 2025Updated last year
- This dbt package captures metadata, artifacts, and test results so you can detect anomalies, monitor data quality, and build metadata tab…☆497Updated this week