sodadata / soda-sparkView external linksLinks
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆63Jun 22, 2022Updated 3 years ago
Alternatives and similar repositories for soda-spark
Users that are interested in soda-spark are comparing it to the libraries listed below
Sorting:
- ☆23Jun 14, 2021Updated 4 years ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,288Updated this week
- pytest plugin to run the tests with support of pyspark☆88May 21, 2025Updated 8 months ago
- 🥪💾 A sample of data from the `jaffle-shop-generator` that powers the Jaffle Shop spanning one year.☆14Jan 23, 2025Updated last year
- ☆11Nov 26, 2024Updated last year
- rb_status_plugin : Data confidence tool for Airflow☆12Jan 7, 2023Updated 3 years ago
- Python API for Deequ☆810Jan 21, 2026Updated 3 weeks ago
- ☆15Dec 15, 2023Updated 2 years ago
- An idiomatic Scala wrapper around the AWS Java SDK☆22Dec 23, 2021Updated 4 years ago
- The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt☆60Jan 5, 2026Updated last month
- ☆24Dec 4, 2023Updated 2 years ago
- ☆28Feb 6, 2026Updated last week
- ☆30Apr 6, 2025Updated 10 months ago
- Convert a CSV fle to ORCFile☆26Apr 10, 2019Updated 6 years ago
- This dbt package captures metadata, artifacts, and test results so you can detect anomalies, monitor data quality, and build metadata tab…☆484Feb 4, 2026Updated last week
- ☆17Oct 20, 2020Updated 5 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Mar 6, 2025Updated 11 months ago
- Nested array transformation helper extensions for Apache Spark☆37Aug 4, 2023Updated 2 years ago
- Modelo de dissertação e teses em latex☆13Oct 23, 2017Updated 8 years ago
- Python Package to Share/Edit Pandas/Polars DF with web interface!☆11Jun 10, 2025Updated 8 months ago
- a curated list of awesome lakehouse frameworks, applications, etc☆40Updated this week
- I'll munch some data here☆12Jun 18, 2021Updated 4 years ago
- This repository has moved into https://github.com/dbt-labs/dbt-adapters☆443Jul 16, 2025Updated 6 months ago
- Utility functions for dbt projects running on Spark☆34Dec 17, 2025Updated last month
- Data validation library for PySpark 3.0.0☆33Nov 11, 2022Updated 3 years ago
- Code snippets used in demos recorded for the blog.☆37Jan 17, 2026Updated 3 weeks ago
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs …☆160Dec 10, 2022Updated 3 years ago
- This web scraper is intended to extract data from The Home Depot Website, it could be run locally or in the Apify platform, the latter is…☆10Oct 13, 2022Updated 3 years ago
- ☆11Mar 27, 2024Updated last year
- Github action for running python unit tests☆10Jun 16, 2025Updated 7 months ago
- An awesome list that curates the best Flet tools, tutorials, blogs and more.☆10Jan 8, 2023Updated 3 years ago
- The Data Product Specification☆11Jan 28, 2025Updated last year
- A Scala library for locality sensitive hashing☆14Aug 1, 2018Updated 7 years ago
- ☆10Jan 28, 2025Updated last year
- Manage Unity Catalog tables with Pydantic Models☆10Mar 5, 2025Updated 11 months ago
- How to customize Tableau authentication using the AWS Athena's JDBC Credentials Provider capabilites.☆14Jun 8, 2020Updated 5 years ago
- Architecture principles☆13May 23, 2025Updated 8 months ago
- Playground site for creating/validating data contracts☆11Aug 9, 2025Updated 6 months ago
- prebuilt configurations for docker-rpm-builder☆11Feb 5, 2021Updated 5 years ago