Clearbox-AI / StructuredDataProfilingLinks
A Python library to check for data quality and automatically generate data tests.
☆42Updated last year
Alternatives and similar repositories for StructuredDataProfiling
Users that are interested in StructuredDataProfiling are comparing it to the libraries listed below
Sorting:
- A Python library to perform NER on structured data and generate PII with Faker☆30Updated last year
- Swiple enables you to easily observe, understand, validate and improve the quality of your data☆84Updated this week
- Explore and compare 1K+ accurate decision trees in your browser!☆165Updated last year
- Possibly the fastest DataFrame-agnostic quality check library in town.☆195Updated last week
- First-party plugins maintained by the Kedro team.☆104Updated this week
- A software engineering framework to jump start your machine learning projects☆37Updated last year
- openclean - Data Cleaning and data profiling library for Python☆78Updated 3 years ago
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆54Updated 2 weeks ago
- Type System for Data Analysis in Python☆213Updated 5 months ago
- Set up a Cost-Effective Modern Data Stack for a Charity☆19Updated 3 months ago
- A tool to automatically infer columns data types in .csv files☆35Updated 2 years ago
- ⚓ Eurybia monitors model drift over time and securizes model deployment with data validation☆211Updated 8 months ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆503Updated 5 months ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 11 months ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated 7 months ago
- Kedro extension for VSCode including LSP and other features☆20Updated 3 months ago
- A portable Datamart and Business Intelligence suite built with Docker, sqlmesh + dbtcore, DuckDB and Superset☆52Updated 8 months ago
- dagster scikit-learn pipeline example.☆44Updated 2 years ago
- An open-source Python library for the assessment of utility and privacy performance of any tabular synthetic dataset.☆23Updated last month
- Opinionated JSON to CSV/XLSX/SQLITE/PARQUET converter. Flattens JSON fast.☆197Updated 3 weeks ago
- Python Data Anonymization & Masking Library For Data Science Tasks☆269Updated 2 years ago
- Start a data science project with modern tools☆198Updated last year
- Runnable☆40Updated this week
- A toolbox 🧰 for Jupyter notebooks 📙: testing, experiment tracking, debugging, profiling, and more!☆67Updated 9 months ago
- Cloud-agnostic Python API☆60Updated last year
- A fast reader for messy CSV files with optional type inference.☆17Updated last month
- Assessing whether data from database complies with reference information.☆43Updated last week
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- Woodwork is a Python library that provides robust methods for managing and communicating data typing information.☆154Updated last week
- Jupyter Widget for Lux☆76Updated 2 years ago