A curated list of awesome synthetic data tools (open source and commercial).
β243Jan 11, 2024Updated 2 years ago
Alternatives and similar repositories for awesome-synthetic-data
Users that are interested in awesome-synthetic-data are comparing it to the libraries listed below
Sorting:
- π A curated list of resources dedicated to synthetic dataβ141Jul 29, 2022Updated 3 years ago
- Synthetic Data Generation with Execution-Based Verification and Grounding for LLM Training.β19Feb 7, 2025Updated last year
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. Iβ¦β24Jun 22, 2022Updated 3 years ago
- plait.py - a fake data modelerβ435Dec 27, 2018Updated 7 years ago
- A software package for privacy-preserving generation of a synthetic twin to a given sensitive data set.β56Sep 3, 2024Updated last year
- A user-friendly Command & Control (C&C) web platform for remote monitoring, management, and task automation across multiple devices.β14Dec 15, 2024Updated last year
- Synthetic data generators for structured and unstructured text, featuring differentially private learning.β672Jun 24, 2025Updated 8 months ago
- Synthetic data generation for tabular dataβ3,434Updated this week
- Build datasets using natural languageβ568Sep 19, 2025Updated 5 months ago
- SRT Subtitles Package for Sublime Text.β12Feb 25, 2015Updated 11 years ago
- BERT score for text generationβ12Jan 15, 2025Updated last year
- Streamlit Dashboard over Superstore Data stored in Postgres Docker container. With SQLAlchemy + Plotly Expressβ13Oct 16, 2024Updated last year
- Official Implementation of Knowledge Flow Promptingβ35Oct 20, 2025Updated 4 months ago
- A simple example of VAEs with KANsβ12May 17, 2024Updated last year
- Public blueprints for data use casesβ85Sep 10, 2025Updated 5 months ago
- Synthetic data generators for tabular and time-series dataβ1,612Updated this week
- Scan and monitor your network effortlessly! Nmap Prometheus Exporter provides insights into network health and security with Prometheus-cβ¦β15Oct 2, 2023Updated 2 years ago
- KL3M training data collection and preprocessingβ20Apr 14, 2025Updated 10 months ago
- β27Aug 16, 2025Updated 6 months ago
- Standardised Metrics and Methods for Synthetic Tabular Data Evaluationβ35Aug 14, 2024Updated last year
- Proof of concept code from Gretel.ai and Illumina using generative neural networks to create synthetic versions of mouse genotype and pheβ¦β33Jan 19, 2022Updated 4 years ago
- the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidlyβ32Oct 19, 2024Updated last year
- MyAssistant Playground --powered by Bedrock Claude & AutoGenβ12Mar 26, 2024Updated last year
- Karpathy's llama2.c transpiled to MLX for Apple Siliconβ14Dec 28, 2023Updated 2 years ago
- Legalpioneer datasetβ15Apr 10, 2025Updated 10 months ago
- Algorithms for generating synthetic dataβ16Jun 18, 2024Updated last year
- Explains Canadian Billsβ17May 13, 2023Updated 2 years ago
- β20Jan 10, 2024Updated 2 years ago
- π Unstructured Data Connectors for Haystack 2.0β17Sep 21, 2023Updated 2 years ago
- β39Sep 6, 2023Updated 2 years ago
- β274Apr 3, 2024Updated last year
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ213Sep 18, 2025Updated 5 months ago
- DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)β206Feb 8, 2022Updated 4 years ago
- An open-source OpenAI wrapper for a RAG-based chatbot that seamlessly integrates with your documents.β22Nov 27, 2024Updated last year
- β43Dec 7, 2022Updated 3 years ago
- In this article, I will present an open-source AI tool for writing grant applications, using Microsoft AutoGen combined with Retrieval-Auβ¦β23Jul 19, 2025Updated 7 months ago
- Web application that makes data releases that satisfy differential privacy using the OpenDP Libraryβ22Aug 2, 2024Updated last year
- LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Developmentβ21Jul 24, 2023Updated 2 years ago
- A CLI in Rust to generate synthetic data for MLX friendly trainingβ25Jan 13, 2024Updated 2 years ago