A curated list of awesome synthetic data tools (open source and commercial).
β254Jan 11, 2024Updated 2 years ago
Alternatives and similar repositories for awesome-synthetic-data
Users that are interested in awesome-synthetic-data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π A curated list of resources dedicated to synthetic dataβ142Jul 29, 2022Updated 3 years ago
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. Iβ¦β24Jun 22, 2022Updated 3 years ago
- Synthetic data generation for tabular dataβ3,497Updated this week
- Standardised Metrics and Methods for Synthetic Tabular Data Evaluationβ37Aug 14, 2024Updated last year
- Synthetic Data Generation with Execution-Based Verification and Grounding for LLM Training.β21Feb 7, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β44Dec 7, 2022Updated 3 years ago
- β27Aug 16, 2025Updated 9 months ago
- Build datasets using natural languageβ575Sep 19, 2025Updated 8 months ago
- KL3M training data collection and preprocessingβ22Apr 14, 2025Updated last year
- A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.β663Apr 21, 2026Updated last month
- a conversational finance assistant that provides users with real-time stock quotes, market news, and insights on market movers through naβ¦β17Apr 26, 2025Updated last year
- MyAssistant Playground --powered by Bedrock Claude & AutoGenβ12Mar 26, 2024Updated 2 years ago
- AI Bill of Materials through source code scanningβ81May 5, 2026Updated 3 weeks ago
- β275Apr 3, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)β206Feb 8, 2022Updated 4 years ago
- The open-source adapter for working with RDF databases and SPARQL queries in Jupyter notebooks leveraging the yFiles Graphs for Jupyter pβ¦β24Apr 4, 2025Updated last year
- A reading list on LLM based Synthetic Data Generation π₯β1,534Jun 5, 2025Updated 11 months ago
- Synthetic Data Generation for mixed-type, multivariate time series.β123Feb 23, 2026Updated 3 months ago
- A PyMOL plugin with accompanying Docker image for kinase inhibitor binding and affinity predictionβ12Jun 3, 2024Updated last year
- β15Apr 29, 2025Updated last year
- A Shared Nearest Neighbors clustering implementation. This code is basically a wrapper of sklearn DBSCAN, implementing the neighborhood sβ¦β16Jan 10, 2022Updated 4 years ago
- A repository for analytics and machine learning case studiesβ13Jan 4, 2026Updated 4 months ago
- In this article, I will present an open-source AI tool for writing grant applications, using Microsoft AutoGen combined with Retrieval-Auβ¦β24Jul 19, 2025Updated 10 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ217Sep 18, 2025Updated 8 months ago
- Tofu is a Python tool for generating synthetic UK Biobank data.β70Jul 25, 2023Updated 2 years ago
- β32Mar 21, 2023Updated 3 years ago
- Linear Algebra for Machine Learning Book Exercisesβ13May 19, 2019Updated 7 years ago
- AI Multi-agent system for real-time, adaptive supply chain coordination and optimization leveraging responsive AI clusters.β37Mar 28, 2024Updated 2 years ago
- This is a curated list of research on diffusion models for tabular data, and serves as the official repository for the survey paper "Diffβ¦β86May 18, 2026Updated last week
- Dynamic modeling for business, economics, and ecology using the system dynamics approachβ32Apr 4, 2024Updated 2 years ago
- A curated list of materials on AI guardrailsβ53Jun 3, 2025Updated 11 months ago
- Explains Canadian Billsβ17May 13, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- β40Mar 20, 2025Updated last year
- An MCP server implementation enabling LLMs to work with new APIs and frameworksβ45Oct 17, 2025Updated 7 months ago
- nbsynthetic is simple and robust tabular synthetic data generation library for small and medium size datasetsβ70Feb 22, 2023Updated 3 years ago
- Our Process for Llama2 Finetuningβ16Sep 8, 2023Updated 2 years ago
- A simple Jax implementation of influence functions.β20Apr 9, 2024Updated 2 years ago
- Tools and service for differentially private processing of tabular and relational dataβ296May 1, 2026Updated 3 weeks ago
- Public blueprints for data use casesβ85Sep 10, 2025Updated 8 months ago