sodadata / soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆63Updated 2 years ago
Alternatives and similar repositories for soda-spark:
Users that are interested in soda-spark are comparing it to the libraries listed below
- A Python Library to support running data quality rules while the spark job is running⚡☆183Updated last week
- Great Expectations Airflow operator☆163Updated this week
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆215Updated last week
- Delta Lake helper methods. No Spark dependency.☆23Updated 7 months ago
- Library to convert DBT manifest metadata to Airflow tasks☆48Updated last year
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆61Updated 2 years ago
- Airflow Providers containing Deferrable Operators & Sensors from Astronomer☆147Updated this week
- Delta lake and filesystem helper methods☆51Updated last year
- A repository of sample code to show data quality checking best practices using Airflow.☆76Updated 2 years ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆172Updated last year
- Delta Lake helper methods in PySpark☆322Updated 7 months ago
- A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.☆194Updated last week
- ☆43Updated 3 years ago
- The athena adapter plugin for dbt (https://getdbt.com)☆140Updated 2 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆224Updated last month
- ☆199Updated last year
- Spark style guide☆259Updated 6 months ago
- Schema modelling framework for decentralised domain-driven ownership of data.☆252Updated last year
- Fast iterative local development and testing of Apache Airflow workflows☆200Updated last week
- pytest plugin to run the tests with support of pyspark☆86Updated last month
- Enforce Best Practices for all your Airflow DAGs. ⭐☆99Updated this week
- Delta Lake examples☆224Updated 6 months ago
- Rules based grant management for Snowflake☆40Updated 6 years ago
- Quick Guides from Dremio on Several topics☆70Updated 3 months ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated this week
- The Picnic Data Vault framework.☆126Updated 10 months ago
- The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)☆232Updated 3 weeks ago
- Apache Airflow integration for dbt☆402Updated 11 months ago
- Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including…☆152Updated last week
- Pytest plugin for dbt core☆60Updated 3 months ago