capitalone / synthetic-dataLinks
Generating complex, nonlinear datasets appropriate for use with deep learning/black box models which 'need' nonlinearity
☆44Updated last year
Alternatives and similar repositories for synthetic-data
Users that are interested in synthetic-data are comparing it to the libraries listed below
Sorting:
- Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!☆132Updated this week
- GAM (Global Attribution Mapping) explains the landscape of neural network predictions across subpopulations☆34Updated 2 months ago
- An abstraction layer for parameter tuning☆35Updated 9 months ago
- Playground for using large language models into the Modern Data Stack for entity matching☆108Updated 2 years ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- A software engineering framework to jump start your machine learning projects☆37Updated last year
- Abstractions for feature engineering on large graphs of tabular data.☆21Updated last month
- Record matching and entity resolution at scale in Spark☆34Updated last year
- mercury-graph is a Python library that offers graph analytics capabilities with a technology-agnostic API.☆30Updated 3 months ago
- Dask integration for Snowflake☆30Updated 7 months ago
- openclean - Data Cleaning and data profiling library for Python☆77Updated 3 years ago
- A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.☆80Updated last year
- DataFrame support for scikit-learn.☆63Updated last year
- Build and deploy a serverless data pipeline on AWS with no effort.☆110Updated 2 years ago
- SPEAR: Programmatically label and build training data quickly.☆107Updated last year
- Tools and utilities for operating Metaflow in production☆57Updated last month
- Linear regression in SQL using dbt☆70Updated 5 months ago
- A Kedro plugin that provides pandas dropin replacements for the pandas datasets (e.g modin and cuDF)☆12Updated 4 years ago
- Woodwork is a Python library that provides robust methods for managing and communicating data typing information.☆153Updated 3 weeks ago
- Assessing whether data from database complies with reference information.☆43Updated last week
- Curated examples and patterns for using Chalk. Use these to build your feature pipelines.☆19Updated last month
- ☆22Updated last year
- IbisML is a library for building scalable ML pipelines using Ibis.☆109Updated 6 months ago
- Your favorite Python graph libraries, scalable and interoperable. Graph databases in memory, and familiar graph APIs for cloud databases.☆109Updated last month
- Unified slicing for all Python data structures.☆35Updated 4 months ago
- Projects developed by Domino's R&D team☆76Updated 3 years ago
- Deploy production-grade Metaflow cloud infrastructure on AWS☆65Updated 2 months ago
- ByteHub: making feature stores simple☆60Updated 4 years ago
- The easiest way to integrate Kedro and Great Expectations☆52Updated 2 years ago
- A playground for running duckdb as a stateless query engine over a data lake☆206Updated last year