capitalone / synthetic-dataLinks
Generating complex, nonlinear datasets appropriate for use with deep learning/black box models which 'need' nonlinearity
☆44Updated last week
Alternatives and similar repositories for synthetic-data
Users that are interested in synthetic-data are comparing it to the libraries listed below
Sorting:
- Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!☆137Updated last week
- A library of Reversible Data Transforms☆126Updated last week
- Public blueprints for data use cases☆84Updated last week
- Abstractions for feature engineering on large graphs of tabular data.☆22Updated last week
- GAM (Global Attribution Mapping) explains the landscape of neural network predictions across subpopulations☆34Updated last month
- A software engineering framework to jump start your machine learning projects☆37Updated last year
- Assessing whether data from database complies with reference information.☆43Updated last week
- DataFrame support for scikit-learn.☆63Updated this week
- Metafeature Extraction for Unstructured Data☆103Updated 6 months ago
- Playground for using large language models into the Modern Data Stack for entity matching☆108Updated 2 years ago
- What's in your data? Extract schema, statistics and entities from datasets☆1,516Updated last week
- A Kedro plugin that provides pandas dropin replacements for the pandas datasets (e.g modin and cuDF)☆12Updated 4 years ago
- Synthetic data generators for structured and unstructured text, featuring differentially private learning.☆657Updated 2 months ago
- Explore and compare 1K+ accurate decision trees in your browser!☆166Updated last year
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆55Updated 2 months ago
- Dask integration for Snowflake☆30Updated last month
- stratx is a library for A Stratification Approach to Partial Dependence for Codependent Variables☆66Updated last year
- Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊☆79Updated 11 months ago
- Build and deploy a serverless data pipeline on AWS with no effort.☆111Updated 2 years ago
- Metrics to evaluate quality and efficacy of synthetic datasets.☆245Updated 3 weeks ago
- A Causal AI package for causal graphs.☆60Updated 5 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Updated last year
- An abstraction layer for parameter tuning☆35Updated last year
- The easiest way to integrate Kedro and Great Expectations☆54Updated 2 years ago
- ☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.☆45Updated 6 months ago
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆26Updated last year
- A playground for running duckdb as a stateless query engine over a data lake☆211Updated last year
- Tries to shrink your Pandas column dtypes with no data loss so you have more spare RAM☆84Updated last year
- 🚀 Stream inferences of real-time ML models in production to any data lake (Experimental)☆81Updated 3 years ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated 2 years ago