Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆64Mar 23, 2026Updated last month
Alternatives and similar repositories for soda-spark
Users that are interested in soda-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23Jun 14, 2021Updated 4 years ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,344May 12, 2026Updated last week
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆62Nov 30, 2022Updated 3 years ago
- Apache Spark Scala utility to track data records during application execution☆11Jun 12, 2023Updated 2 years ago
- pytest plugin to run the tests with support of pyspark☆88May 21, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆11Nov 26, 2024Updated last year
- ☆16Jun 11, 2020Updated 5 years ago
- Python API for Deequ☆820May 9, 2026Updated last week
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆63Sep 6, 2024Updated last year
- 🥪💾 A sample of data from the `jaffle-shop-generator` that powers the Jaffle Shop spanning one year.☆16Jan 23, 2025Updated last year
- rb_status_plugin : Data confidence tool for Airflow☆12Jan 7, 2023Updated 3 years ago
- Sample code and projects for A Joyful Introduction To Clojure☆16Jul 10, 2019Updated 6 years ago
- ETL with Azure Cookbook, published by Packt☆12Jan 18, 2023Updated 3 years ago
- ☆19Oct 20, 2020Updated 5 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆30Apr 6, 2025Updated last year
- Basic Spark utilities☆13Feb 20, 2025Updated last year
- pyspark methods to enhance developer productivity 📣 👯 🎉☆687Mar 6, 2025Updated last year
- IntelliJ SourcePawn Plugin☆13May 9, 2022Updated 4 years ago
- ☆38Mar 18, 2026Updated 2 months ago
- An idiomatic Scala wrapper around the AWS Java SDK☆22Dec 23, 2021Updated 4 years ago
- An Open Standard for lineage metadata collection☆2,460Updated this week
- Spark in Action, 2nd edition - chapter 15 - Aggregating your data☆12Sep 8, 2022Updated 3 years ago
- Helm chart for deploying Apache Airflow in kubernetes☆19Aug 13, 2019Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt☆61Apr 20, 2026Updated 3 weeks ago
- Tutorial and examples of Data Quality in Big Data System☆11Apr 25, 2017Updated 9 years ago
- Cloudflare worker☆19Sep 16, 2022Updated 3 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,617Updated this week
- ☆10Jan 28, 2025Updated last year
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- This dbt package captures metadata, artifacts, and test results so you can detect anomalies, monitor data quality, and build metadata tab…☆507Updated this week
- Magic to help Spark pipelines upgrade☆34Sep 29, 2024Updated last year
- Fast data quality framework for modern data infrastructure☆29Apr 2, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The one file simple bug tracking application that incorporates a kanban board.☆12Jan 31, 2014Updated 12 years ago
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observ…☆194Jan 5, 2026Updated 4 months ago
- Python Package for ducklake☆20Jun 5, 2025Updated 11 months ago
- chDB AWS Lambda container☆18Aug 31, 2023Updated 2 years ago
- Extension package for dbt to build a metadata table for your dbt models along side your models.☆16Mar 31, 2023Updated 3 years ago
- ☆10Jun 29, 2023Updated 2 years ago
- A cloud native data mesh implementation☆12Jan 15, 2021Updated 5 years ago