capitalone / datacompyLinks
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
☆631Updated this week
Alternatives and similar repositories for datacompy
Users that are interested in datacompy are comparing it to the libraries listed below
Sorting:
- Python API for Deequ☆810Updated 2 weeks ago
- PySpark test helper methods with beautiful error messages☆752Updated 3 weeks ago
- Turning PySpark Into a Universal DataFrame API☆485Updated this week
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆226Updated last week
- Possibly the fastest DataFrame-agnostic quality check library in town.☆236Updated this week
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Updated 11 months ago
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆168Updated 2 years ago
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,136Updated this week
- Great Expectations Airflow operator☆170Updated last week
- Delta Lake helper methods in PySpark☆327Updated 3 weeks ago
- Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.☆375Updated 8 months ago
- Data Contracts engine for the modern data stack. https://www.soda.io☆2,281Updated this week
- Snowflake Connector for Python☆707Updated this week
- Read/Write pandas DataFrames with Tableau Hyper Extracts☆121Updated 4 months ago
- Better SQL in Jupyter. 📊☆840Updated last month
- Snowflake SQLAlchemy☆261Updated this week
- Snowflake Snowpark Python API☆325Updated this week
- Apache Airflow integration for dbt☆411Updated last year
- Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with …☆652Updated this week
- CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer☆437Updated this week
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆510Updated last month
- A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects☆225Updated 9 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Updated 3 months ago
- Generate and Visualize Data Lineage from query history☆328Updated 2 years ago
- A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.☆186Updated 2 years ago
- ✨ A Pydantic to PySpark schema library☆118Updated this week
- Making DAG construction easier☆283Updated last month
- ☆81Updated 11 months ago
- Dagster Labs' open-source data platform, built with Dagster.☆435Updated this week
- Tool to automate data quality checks on data pipelines☆256Updated 3 years ago