capitalone / datacompy
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
β550Updated this week
Alternatives and similar repositories for datacompy:
Users that are interested in datacompy are comparing it to the libraries listed below
- Python API for Deequβ764Updated 2 weeks ago
- pyspark methods to enhance developer productivity π£ π― πβ669Updated last month
- PySpark test helper methods with beautiful error messagesβ685Updated last week
- Snowflake Connector for Pythonβ623Updated this week
- Snowflake Snowpark Python APIβ293Updated this week
- Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with β¦β631Updated last week
- Snowflake SQLAlchemyβ247Updated 3 weeks ago
- Turning PySpark Into a Universal DataFrame APIβ385Updated this week
- Monitor the stability of a Pandas or Spark dataframe βοΈβ500Updated 2 months ago
- Create HTML profiling reports from Apache Spark DataFramesβ196Updated 5 years ago
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.β167Updated last year
- dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)β1,049Updated last week
- Distributed SQL Engine in Python using Daskβ401Updated 7 months ago
- Fast iterative local development and testing of Apache Airflow workflowsβ200Updated this week
- Possibly the fastest DataFrame-agnostic quality check library in town.β186Updated last week
- python implementation of the parquet columnar file format.β822Updated 3 weeks ago
- Dynamically generate Apache Airflow DAGs from YAML configuration filesβ1,277Updated this week
- Port(ish) of Great Expectations to dbt test macrosβ1,158Updated 4 months ago
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.ioβ2,063Updated this week
- Great Expectations Airflow operatorβ162Updated last week
- Schema modelling framework for decentralised domain-driven ownership of data.β252Updated last year
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewβ¦β2,072Updated 3 weeks ago
- dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricksβ426Updated 2 months ago
- A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.β184Updated last year
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflowβ215Updated this week
- Dagster Labs' open-source data platform, built with Dagster.β342Updated this week
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withouβ¦β113Updated last year
- CLI tool for dbt users to simplify creation of staging models (yml and sql) filesβ261Updated this week
- python automatic data quality check toolkitβ283Updated 4 years ago
- Macros that generate dbt codeβ550Updated 2 weeks ago