capitalone / synthetic-data
Generating complex, nonlinear datasets appropriate for use with deep learning/black box models which 'need' nonlinearity
☆44Updated 8 months ago
Alternatives and similar repositories for synthetic-data:
Users that are interested in synthetic-data are comparing it to the libraries listed below
- GAM (Global Attribution Mapping) explains the landscape of neural network predictions across subpopulations☆33Updated last month
- Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!☆131Updated 3 months ago
- What's in your data? Extract schema, statistics and entities from datasets☆1,458Updated 2 weeks ago
- ☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.☆45Updated 5 months ago
- Deploy a Prefect flow to serverless AWS Lambda function☆36Updated 2 years ago
- Kedro-Accelerator speeds up pipelines by parallelizing I/O in the background.☆35Updated 2 years ago
- Tools and utilities for operating Metaflow in production☆50Updated last week
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Sentiment and language detection for text analytics.☆16Updated 7 months ago
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- The Federated Model Aggregation (FMA) Service is a collection of installable python components that make up the generic workflow/infrastr…☆31Updated last month
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆112Updated 10 months ago
- A playground for running duckdb as a stateless query engine over a data lake☆184Updated last year
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark☆76Updated last year
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆26Updated 2 months ago
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. I…☆21Updated 2 years ago
- Templates for your Kedro projects.☆71Updated last week
- Example project demonstrating deployment patterns for real-time streaming workflows with Prefect 2.0☆44Updated 2 years ago
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated 11 months ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- First-party plugins maintained by the Kedro team.☆96Updated last week
- A curated list of example code to collect data from Web APIs using DataPrep.Connector.☆35Updated last year
- A Kedro plugin that provides pandas dropin replacements for the pandas datasets (e.g modin and cuDF)☆12Updated 4 years ago
- Python stream processing for analytics☆31Updated this week
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 6 months ago
- A software engineering framework to jump start your machine learning projects☆37Updated 8 months ago
- Python Library for FeatureOps☆64Updated this week
- Data pipelines from re-usable components☆108Updated last year