Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆64Mar 23, 2026Updated 2 months ago
Alternatives and similar repositories for soda-spark
Users that are interested in soda-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23Jun 14, 2021Updated 4 years ago
- Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.☆17Jan 29, 2026Updated 4 months ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,366Updated this week
- ☆15Mar 23, 2026Updated 2 months ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆62Nov 30, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- pytest plugin to run the tests with support of pyspark☆88May 21, 2025Updated last year
- ☆11Nov 26, 2024Updated last year
- ☆16Jun 11, 2020Updated 5 years ago
- Python API for Deequ☆819Updated this week
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆63Sep 6, 2024Updated last year
- Python client for Taboola API☆15Apr 2, 2021Updated 5 years ago
- 🥪💾 A sample of data from the `jaffle-shop-generator` that powers the Jaffle Shop spanning one year.☆16Jan 23, 2025Updated last year
- Run GitHub GraphQL queries and mutations in VS Code☆13Apr 15, 2022Updated 4 years ago
- Source code for 'PySpark Recipes' by Raju Kumar Mishra☆26Nov 30, 2019Updated 6 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- rb_status_plugin : Data confidence tool for Airflow☆12Jan 7, 2023Updated 3 years ago
- Sample code and projects for A Joyful Introduction To Clojure☆16Jul 10, 2019Updated 6 years ago
- Getting Great Expectations setup to run on DataBricks with Spark Dataframes.☆13Jun 2, 2022Updated 4 years ago
- ☆20Oct 20, 2020Updated 5 years ago
- Delta Live Tables Workshop Resources☆17Feb 24, 2023Updated 3 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆687Mar 6, 2025Updated last year
- An example to illustrate using Luigi to manage a data science workflow in Greenplum Database☆12Feb 5, 2019Updated 7 years ago
- ☆38Mar 18, 2026Updated 2 months ago
- Prefect integrations for interacting with Hightouch.☆11Feb 8, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An idiomatic Scala wrapper around the AWS Java SDK☆22Dec 23, 2021Updated 4 years ago
- Instalador autonomo do Apache Spark para Sistemas linux: based(Debian,RHEL)☆13Dec 10, 2024Updated last year
- Sample of different ways to call Azure Functions which may be longer than 2 minutes from Azure Logic Apps☆14Aug 20, 2020Updated 5 years ago
- Python client for Marquez☆12Dec 4, 2020Updated 5 years ago
- SQL Server DBA Code and Helpful Scripts☆11Aug 16, 2012Updated 13 years ago
- An Open Standard for lineage metadata collection☆2,497Updated this week
- The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt☆61Apr 20, 2026Updated last month
- Fast, zero-copy HTML Parser written in Rust☆30Dec 6, 2025Updated 6 months ago
- Autodoc for PG SQL files☆18Jul 14, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,618May 29, 2026Updated last week
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- This dbt package captures metadata, artifacts, and test results so you can detect anomalies, monitor data quality, and build metadata tab…☆511Updated this week
- Magic to help Spark pipelines upgrade☆33Sep 29, 2024Updated last year
- This repository holds the content of the starting page and the "glue" pages for the official TYPO3 documentation on https://docs.typo3.or…☆16May 14, 2026Updated 3 weeks ago
- dbt data models for facebook ads☆41Dec 4, 2024Updated last year
- Fast data quality framework for modern data infrastructure☆29Apr 2, 2026Updated 2 months ago