great-expectations / great_expectations
Always know what to expect from your data.
☆10,334Updated last week
Alternatives and similar repositories for great_expectations:
Users that are interested in great_expectations are comparing it to the libraries listed below
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆10,276Updated last week
- dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build application…☆10,685Updated this week
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io☆2,067Updated this week
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…☆4,557Updated 3 weeks ago
- An orchestration platform for the development, production, and observation of data assets.☆12,990Updated this week
- Build, Manage and Deploy AI/ML Systems☆8,742Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,406Updated last week
- re_data - fix data issues before your users & CEO would discover them 😊☆1,562Updated 11 months ago
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.☆19,062Updated this week
- An Open Standard for lineage metadata collection☆1,917Updated this week
- Modin: Scale your Pandas workflows by changing a single line of code☆10,116Updated last week
- Collect, aggregate, and visualize a data ecosystem's metadata☆1,903Updated 2 weeks ago
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,074Updated 3 weeks ago
- A light-weight, flexible, and expressive statistical data testing library☆3,762Updated this week
- data load tool (dlt) is an open source Python library that makes data loading easy 🛠️☆3,490Updated this week
- the portable Python dataframe library☆5,705Updated this week
- Build data pipelines, the easy way 🛠️☆4,113Updated last year
- Parallel computing with task scheduling☆13,136Updated last week
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,371Updated 6 months ago
- 📚 Parameterize, execute, and analyze notebooks☆6,137Updated 2 weeks ago
- The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️☆3,557Updated 7 months ago
- A next-generation curated knowledge sharing platform for data scientists and other technical professions.☆5,513Updated 7 months ago
- A modular SQL linter and auto-formatter with support for multiple dialects and templated code.☆8,800Updated this week
- The Open Source Feature Store for AI/ML☆5,975Updated this week
- Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to wr…☆2,037Updated this week
- 🧙 Build, run, and manage data pipelines for integrating and transforming data.☆8,262Updated last week
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.☆12,869Updated this week
- Python Stream Processing☆6,787Updated 8 months ago
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…☆18,234Updated 2 months ago
- Utility functions for dbt projects.☆1,507Updated 3 weeks ago