capitalone / datacompyLinks
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
☆580Updated this week
Alternatives and similar repositories for datacompy
Users that are interested in datacompy are comparing it to the libraries listed below
Sorting:
- Possibly the fastest DataFrame-agnostic quality check library in town.☆195Updated last week
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆169Updated last year
- Python API for Deequ☆787Updated 3 months ago
- PySpark test helper methods with beautiful error messages☆704Updated last week
- Turning PySpark Into a Universal DataFrame API☆414Updated this week
- pyspark methods to enhance developer productivity 📣 👯 🎉☆674Updated 4 months ago
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆218Updated last month
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,094Updated 3 months ago
- Snowflake SQLAlchemy☆250Updated last week
- Generate and Visualize Data Lineage from query history☆326Updated last year
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆503Updated 5 months ago
- Snowflake Connector for Python☆661Updated this week
- Distributed SQL Engine in Python using Dask☆406Updated 10 months ago
- Delta Lake helper methods in PySpark☆324Updated 10 months ago
- Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.☆372Updated last month
- Great Expectations Airflow operator☆167Updated this week
- Snowflake Snowpark Python API☆302Updated this week
- Schema modelling framework for decentralised domain-driven ownership of data.☆254Updated last year
- Dagster Labs' open-source data platform, built with Dagster.☆378Updated last week
- Read/Write pandas DataFrames with Tableau Hyper Extracts☆121Updated 2 months ago
- Apache Airflow integration for dbt☆410Updated last year
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- Making DAG construction easier☆267Updated 2 months ago
- Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with …☆638Updated 2 weeks ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆110Updated this week
- A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.☆185Updated 2 years ago
- PyAirbyte brings the power of Airbyte to every Python developer.☆277Updated last week
- Data product portal created by Dataminded☆186Updated this week
- Tool to automate data quality checks on data pipelines☆254Updated 2 years ago
- ✨ A Pydantic to PySpark schema library☆98Updated this week