Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆63Jun 22, 2022Updated 3 years ago
Alternatives and similar repositories for soda-spark
Users that are interested in soda-spark are comparing it to the libraries listed below
Sorting:
- ☆23Jun 14, 2021Updated 4 years ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,303Updated this week
- pytest plugin to run the tests with support of pyspark☆88May 21, 2025Updated 9 months ago
- Apache Spark Scala utility to track data records during application execution☆11Jun 12, 2023Updated 2 years ago
- 🥪💾 A sample of data from the `jaffle-shop-generator` that powers the Jaffle Shop spanning one year.☆15Jan 23, 2025Updated last year
- ☆11Nov 26, 2024Updated last year
- Usage examples for byte-genie API☆12Apr 27, 2024Updated last year
- rb_status_plugin : Data confidence tool for Airflow☆12Jan 7, 2023Updated 3 years ago
- Python client for Marquez☆12Dec 4, 2020Updated 5 years ago
- Python API for Deequ☆813Updated this week
- ☆15Dec 15, 2023Updated 2 years ago
- ☆16Jun 11, 2020Updated 5 years ago
- An idiomatic Scala wrapper around the AWS Java SDK☆22Dec 23, 2021Updated 4 years ago
- ☆19Aug 29, 2020Updated 5 years ago
- The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt☆60Jan 5, 2026Updated 2 months ago
- ☆24Dec 4, 2023Updated 2 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆62Sep 6, 2024Updated last year
- Source code for 'PySpark Recipes' by Raju Kumar Mishra☆26Nov 30, 2019Updated 6 years ago
- Convert a CSV fle to ORCFile☆26Apr 10, 2019Updated 6 years ago
- ☆30Apr 6, 2025Updated 11 months ago
- Apache flink☆77Feb 16, 2026Updated 3 weeks ago
- This dbt package captures metadata, artifacts, and test results so you can detect anomalies, monitor data quality, and build metadata tab…☆492Updated this week
- ☆17Oct 20, 2020Updated 5 years ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆62Nov 30, 2022Updated 3 years ago
- dbt data models for facebook ads☆41Dec 4, 2024Updated last year
- Spark Structured Streaming Kinesis Data Streams connector supports both GetRecords and SubscribeToShard (Enhanced Fan-Out, EFO)☆39Mar 2, 2026Updated last week
- ☆37Updated this week
- pyspark methods to enhance developer productivity 📣 👯 🎉☆684Mar 6, 2025Updated last year
- Magic to help Spark pipelines upgrade☆34Sep 29, 2024Updated last year
- Nested array transformation helper extensions for Apache Spark☆37Aug 4, 2023Updated 2 years ago
- I'll munch some data here☆12Jun 18, 2021Updated 4 years ago
- Java implementation of the EbMS 2.0 specification.☆10Feb 20, 2026Updated 2 weeks ago
- a curated list of awesome lakehouse frameworks, applications, etc☆42Feb 9, 2026Updated last month
- PDF to JSON, JSON to PDF and etc.☆12Apr 18, 2018Updated 7 years ago
- Python Package to Share/Edit Pandas/Polars DF with web interface!☆11Jun 10, 2025Updated 8 months ago
- This repository has moved into https://github.com/dbt-labs/dbt-adapters☆444Jul 16, 2025Updated 7 months ago
- ☆32Aug 18, 2021Updated 4 years ago
- Utility functions for dbt projects running on Spark☆34Dec 17, 2025Updated 2 months ago
- Code snippets used in demos recorded for the blog.☆38Updated this week