capitalone/DataProfiler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/capitalone/DataProfiler)

capitalone / DataProfiler

What's in your data? Extract schema, statistics and entities from datasets

☆1,571

Alternatives and similar repositories for DataProfiler

Users that are interested in DataProfiler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

capitalone / synthetic-data
View on GitHub
Generating complex, nonlinear datasets appropriate for use with deep learning/black box models which 'need' nonlinearity 
☆48Nov 24, 2025Updated 8 months ago
capitalone / rubicon-ml
View on GitHub
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!
☆140Jul 16, 2026Updated last week
fivetran / great_expectations
View on GitHub
Always know what to expect from your data.
☆11,664Updated this week
capitalone / ablation
View on GitHub
Evaluating XAI methods through ablation studies.
☆19Dec 28, 2024Updated last year
capitalone / global-attribution-mapping
View on GitHub
GAM (Global Attribution Mapping) explains the landscape of neural network predictions across subpopulations
☆37Jan 23, 2026Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
capitalone / datacompy
View on GitHub
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
☆653Updated this week
sodadata / soda-core
View on GitHub
Data Contracts engine for the modern data stack. https://www.soda.io
☆2,397Updated this week
datafold / data-diff
View on GitHub
Compare tables within or across databases
☆2,990May 17, 2024Updated 2 years ago
unionai-oss / pandera
View on GitHub
A light-weight, flexible, and expressive statistical data testing library
☆4,409Updated this week
capitalone / edgetest
View on GitHub
edgetest is a tox-inspired python library that will loop through your project's dependencies, and check if your project is compatible wit…
☆26Jul 14, 2026Updated last week
kedro-org / kedro
View on GitHub
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…
☆10,929Updated this week
Data-Centric-AI-Community / fg-data-profiling
View on GitHub
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
☆13,651Apr 22, 2026Updated 3 months ago
ploomber / ploomber
View on GitHub
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
☆3,622May 29, 2025Updated last year
amundsen-io / amundsen
View on GitHub
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…
☆4,781Jul 1, 2026Updated 3 weeks ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zillow / luminaire
View on GitHub
Luminaire is a python package that provides ML driven solutions for monitoring time series data.
☆807Jun 2, 2026Updated last month
stitchfix / hamilton
View on GitHub
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
☆860Jul 3, 2023Updated 3 years ago
ibis-project / ibis
View on GitHub
the portable Python dataframe library
☆6,605Updated this week
sfu-db / dataprep
View on GitHub
Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
☆2,248Jun 27, 2024Updated 2 years ago
vaexio / vaex
View on GitHub
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…
☆8,510Apr 1, 2026Updated 3 months ago
whylabs / whylogs
View on GitHub
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model perf…
☆2,828Jan 10, 2025Updated last year
evidentlyai / evidently
View on GitHub
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. Fro…
☆7,744May 2, 2026Updated 2 months ago
dagster-io / dagster
View on GitHub
An orchestration platform for the development, production, and observation of data assets.
☆15,883Updated this week
lux-org / lux
View on GitHub
Automatically visualize your pandas dataframe via a single print! 📊 💡
☆5,378Mar 20, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
orchest / orchest
View on GitHub
Build data pipelines, the easy way 🛠️
☆4,135Jun 6, 2023Updated 3 years ago
tokern / piicatcher
View on GitHub
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
☆346Jan 5, 2024Updated 2 years ago
simonw / datasette
View on GitHub
An open source multi-tool for exploring and publishing data
☆11,303Jul 14, 2026Updated last week
moj-analytical-services / splink
View on GitHub
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
☆2,276Updated this week
PrefectHQ / prefect
View on GitHub
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
☆23,466Updated this week
evidence-dev / evidence
View on GitHub
Business intelligence as code: build fast, interactive data visualizations in SQL and markdown
☆6,770Feb 18, 2026Updated 5 months ago
featureform / featureform
View on GitHub
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
☆1,981Jul 3, 2025Updated last year
nteract / papermill
View on GitHub
📚 Parameterize, execute, and analyze notebooks
☆6,460Jul 6, 2026Updated 2 weeks ago
fugue-project / fugue
View on GitHub
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…
☆2,170May 19, 2026Updated 2 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
tobymao / sqlglot
View on GitHub
Python SQL Parser and Transpiler
☆9,454Updated this week
zinggAI / zingg
View on GitHub
Scalable master data management, identity resolution, entity resolution, and deduplication using ML
☆1,235Updated this week
Netflix / metaflow
View on GitHub
Build, Manage and Deploy AI/ML Systems
☆10,194Updated this week
man-group / dtale
View on GitHub
Visualizer for pandas data structures
☆5,204Updated this week
data-describe / data-describe
View on GitHub
data⎰describe: Pythonic EDA Accelerator for Data Science
☆302Feb 22, 2023Updated 3 years ago
rilldata / rill
View on GitHub
The fastest business intelligence tool for humans and agents.
☆2,770Updated this week
pola-rs / polars
View on GitHub
Extremely fast Query Engine for DataFrames, written in Rust
☆39,080Updated this week