capitalone / datacompyLinks
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
☆619Updated last week
Alternatives and similar repositories for datacompy
Users that are interested in datacompy are comparing it to the libraries listed below
Sorting:
- Python API for Deequ☆806Updated 8 months ago
- Possibly the fastest DataFrame-agnostic quality check library in town.☆229Updated last month
- PySpark test helper methods with beautiful error messages☆735Updated last week
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆222Updated last week
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆509Updated 3 months ago
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆168Updated 2 years ago
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,127Updated last week
- Turning PySpark Into a Universal DataFrame API☆462Updated last week
- pyspark methods to enhance developer productivity 📣 👯 🎉☆676Updated 9 months ago
- Great Expectations Airflow operator☆169Updated last week
- Snowflake Snowpark Python API☆319Updated this week
- Apache Airflow integration for dbt☆411Updated last year
- Read/Write pandas DataFrames with Tableau Hyper Extracts☆121Updated 2 months ago
- Snowflake SQLAlchemy☆260Updated last week
- Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.☆376Updated 6 months ago
- Snowflake Connector for Python☆698Updated last week
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,537Updated last year
- A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects☆224Updated 7 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Updated last month
- Making DAG construction easier☆281Updated 2 months ago
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io☆2,248Updated last week
- Dagster Labs' open-source data platform, built with Dagster.☆421Updated last week
- Delta Lake helper methods in PySpark☆325Updated last year
- Distributed SQL Engine in Python using Dask☆408Updated last year
- Generate and Visualize Data Lineage from query history☆326Updated 2 years ago
- Clean APIs for data cleaning. Python implementation of R package Janitor☆1,470Updated last week
- dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)☆1,199Updated this week
- A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.☆186Updated 2 years ago
- Schema modelling framework for decentralised domain-driven ownership of data.☆259Updated 2 years ago
- Data pipeline with dbt, Airflow, Great Expectations☆165Updated 4 years ago