Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆63Mar 23, 2026Updated 3 weeks ago
Alternatives and similar repositories for soda-spark
Users that are interested in soda-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23Jun 14, 2021Updated 4 years ago
- Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.☆17Jan 29, 2026Updated 2 months ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,331Updated this week
- ☆15Mar 23, 2026Updated 3 weeks ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆62Nov 30, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Apache Spark Scala utility to track data records during application execution☆11Jun 12, 2023Updated 2 years ago
- ☆11Nov 26, 2024Updated last year
- ☆16Jun 11, 2020Updated 5 years ago
- Python API for Deequ☆815Mar 9, 2026Updated last month
- Python client for Taboola API☆15Apr 2, 2021Updated 5 years ago
- 🥪💾 A sample of data from the `jaffle-shop-generator` that powers the Jaffle Shop spanning one year.☆16Jan 23, 2025Updated last year
- Run GitHub GraphQL queries and mutations in VS Code☆13Apr 15, 2022Updated 4 years ago
- A Github Action automatically assigns reviewers to PR based on changed files☆23Apr 10, 2026Updated last week
- rb_status_plugin : Data confidence tool for Airflow☆12Jan 7, 2023Updated 3 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Sample code and projects for A Joyful Introduction To Clojure☆16Jul 10, 2019Updated 6 years ago
- Delta Live Tables Workshop Resources☆17Feb 24, 2023Updated 3 years ago
- ☆30Apr 6, 2025Updated last year
- Basic Spark utilities☆13Feb 20, 2025Updated last year
- ☆38Mar 18, 2026Updated last month
- Prefect integrations for interacting with Hightouch.☆11Feb 8, 2024Updated 2 years ago
- Python client for Marquez☆12Dec 4, 2020Updated 5 years ago
- jQuery autocomplete☆18Sep 1, 2015Updated 10 years ago
- An Open Standard for lineage metadata collection☆2,412Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- SQL Server DBA Code and Helpful Scripts☆11Aug 16, 2012Updated 13 years ago
- Helm chart for deploying Apache Airflow in kubernetes☆19Aug 13, 2019Updated 6 years ago
- Spark in Action, 2nd edition - chapter 15 - Aggregating your data☆12Sep 8, 2022Updated 3 years ago
- The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt☆61Jan 5, 2026Updated 3 months ago
- Fast, zero-copy HTML Parser written in Rust☆27Dec 6, 2025Updated 4 months ago
- Some .NET samples demonstrating how to use the Selenium WebDriver to perform BDD tests and compare screenshots with PhantomJS☆12Feb 23, 2015Updated 11 years ago
- Autodoc for PG SQL files☆18Jul 14, 2023Updated 2 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,605Apr 1, 2026Updated 2 weeks ago
- ☆10Jan 28, 2025Updated last year
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- This dbt package captures metadata, artifacts, and test results so you can detect anomalies, monitor data quality, and build metadata tab…☆501Updated this week
- Extension package for dbt to build a metadata table for your dbt models along side your models.☆15Mar 31, 2023Updated 3 years ago
- This repository holds the content of the starting page and the "glue" pages for the official TYPO3 documentation on https://docs.typo3.or…☆16Mar 6, 2026Updated last month
- Fast data quality framework for modern data infrastructure☆29Apr 2, 2026Updated 2 weeks ago
- The one file simple bug tracking application that incorporates a kanban board.☆12Jan 31, 2014Updated 12 years ago
- Execute with Python Virtual Environment Activated☆17Nov 12, 2024Updated last year