Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆64Mar 23, 2026Updated 3 months ago
Alternatives and similar repositories for soda-spark
Users that are interested in soda-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23Jun 14, 2021Updated 5 years ago
- Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.☆17Jan 29, 2026Updated 5 months ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,376Updated this week
- ☆15Mar 23, 2026Updated 3 months ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆62Nov 30, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Apache Spark Scala utility to track data records during application execution☆11Jun 12, 2023Updated 3 years ago
- pytest plugin to run the tests with support of pyspark☆88May 21, 2025Updated last year
- ☆16Jun 11, 2020Updated 6 years ago
- Python API for Deequ☆822Jun 11, 2026Updated 2 weeks ago
- 🥪💾 A sample of data from the `jaffle-shop-generator` that powers the Jaffle Shop spanning one year.☆16Jan 23, 2025Updated last year
- Run GitHub GraphQL queries and mutations in VS Code☆13Apr 15, 2022Updated 4 years ago
- Source code for 'PySpark Recipes' by Raju Kumar Mishra☆26Nov 30, 2019Updated 6 years ago
- rb_status_plugin : Data confidence tool for Airflow☆12Jan 7, 2023Updated 3 years ago
- Sample code and projects for A Joyful Introduction To Clojure☆16Jul 10, 2019Updated 6 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ETL with Azure Cookbook, published by Packt☆12Jan 18, 2023Updated 3 years ago
- Delta Live Tables Workshop Resources☆17Feb 24, 2023Updated 3 years ago
- ☆30Apr 6, 2025Updated last year
- Basic Spark utilities☆13Feb 20, 2025Updated last year
- An example to illustrate using Luigi to manage a data science workflow in Greenplum Database☆12Feb 5, 2019Updated 7 years ago
- ☆38Mar 18, 2026Updated 3 months ago
- Prefect integrations for interacting with Hightouch.☆11Feb 8, 2024Updated 2 years ago
- An idiomatic Scala wrapper around the AWS Java SDK☆22Dec 23, 2021Updated 4 years ago
- SQL Server DBA Code and Helpful Scripts☆11Aug 16, 2012Updated 13 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- An Open Standard for lineage metadata collection☆2,517Updated this week
- Spark in Action, 2nd edition - chapter 15 - Aggregating your data☆12Sep 8, 2022Updated 3 years ago
- Helm chart for deploying Apache Airflow in kubernetes☆19Aug 13, 2019Updated 6 years ago
- The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt☆61Apr 20, 2026Updated 2 months ago
- Tutorial and examples of Data Quality in Big Data System☆11Apr 25, 2017Updated 9 years ago
- Scraper for question discussions on ExamTopics☆12Dec 31, 2022Updated 3 years ago
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,623Jun 18, 2026Updated last week
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- This dbt package captures metadata, artifacts, and test results so you can detect anomalies, monitor data quality, and build metadata tab…☆514Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Tools for working with CSV files in IPython.☆10Feb 17, 2016Updated 10 years ago
- Magic to help Spark pipelines upgrade☆34Updated this week
- Fast data quality framework for modern data infrastructure☆29Apr 2, 2026Updated 2 months ago
- Spark development environment for kubernetes, spark-submit and jupyter notebook☆18Nov 30, 2021Updated 4 years ago
- Python Package for ducklake☆20Jun 5, 2025Updated last year
- chDB AWS Lambda container☆19Aug 31, 2023Updated 2 years ago
- ☆12Aug 6, 2020Updated 5 years ago