capitalone / synthetic-data
Generating complex, nonlinear datasets appropriate for use with deep learning/black box models which 'need' nonlinearity
☆44Updated 9 months ago
Alternatives and similar repositories for synthetic-data:
Users that are interested in synthetic-data are comparing it to the libraries listed below
- Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!☆132Updated 4 months ago
- GAM (Global Attribution Mapping) explains the landscape of neural network predictions across subpopulations☆33Updated 2 months ago
- What's in your data? Extract schema, statistics and entities from datasets☆1,470Updated this week
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Helper code to interact with Rasgo via our SDK, PyRasgo☆40Updated 2 years ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 7 months ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- real-time data + ML pipeline☆54Updated last month
- Assessing whether data from database complies with reference information.☆42Updated 2 weeks ago
- Abstractions for feature engineering on large graphs of tabular data.☆21Updated this week
- Deploy production-grade Metaflow cloud infrastructure on AWS☆65Updated 2 months ago
- Demo repository to lambda-fy your dbt runs☆11Updated last year
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆53Updated 6 months ago
- Deploy a Prefect flow to serverless AWS Lambda function☆36Updated 2 years ago
- Dask integration for Snowflake☆30Updated 4 months ago
- ☆51Updated this week
- Linear regression in SQL using dbt☆69Updated 2 months ago
- ☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.☆45Updated 2 weeks ago
- Example Dagster Cloud code for the Hooli Data Engineering organization.☆1Updated last week
- Data models for Fivetran's Netsuite connector, built using dbt.☆39Updated last week
- Public-facing example applications built using the Snowflake Native App Framework☆42Updated 2 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated 11 months ago
- Template Dagster repo using poetry and a single Docker container; works well with CICD☆67Updated 2 years ago
- Package designed for handling imbalanced classification☆18Updated last year
- This connector is a dbt project that maps Medicare CCLF claims data to the Tuva Input Layer.☆13Updated 3 months ago
- Apache datasketches☆28Updated 3 weeks ago
- openclean - Data Cleaning and data profiling library for Python☆74Updated 3 years ago
- Ingesting data with Pulumi, AWS lambdas and Snowflake in a scalable, fully replayable manner☆71Updated 3 years ago
- Example project demonstrating deployment patterns for real-time streaming workflows with Prefect 2.0☆44Updated 2 years ago