statice / awesome-synthetic-dataView external linksLinks
A curated list of awesome synthetic data tools (open source and commercial).
☆239Jan 11, 2024Updated 2 years ago
Alternatives and similar repositories for awesome-synthetic-data
Users that are interested in awesome-synthetic-data are comparing it to the libraries listed below
Sorting:
- Synthetic Data Generation with Execution-Based Verification and Grounding for LLM Training.☆19Feb 7, 2025Updated last year
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. I…☆25Jun 22, 2022Updated 3 years ago
- plait.py - a fake data modeler☆436Dec 27, 2018Updated 7 years ago
- A software package for privacy-preserving generation of a synthetic twin to a given sensitive data set.☆56Sep 3, 2024Updated last year
- A user-friendly Command & Control (C&C) web platform for remote monitoring, management, and task automation across multiple devices.☆14Dec 15, 2024Updated last year
- Build datasets using natural language☆566Sep 19, 2025Updated 4 months ago
- Streamlit Dashboard over Superstore Data stored in Postgres Docker container. With SQLAlchemy + Plotly Express☆13Oct 16, 2024Updated last year
- Swift package that houses commonly used functions, extensions, views, classes, etc.☆12Oct 25, 2025Updated 3 months ago
- BERT score for text generation☆12Jan 15, 2025Updated last year
- The DJIN model of aging.☆10Aug 11, 2022Updated 3 years ago
- A simple example of VAEs with KANs☆11May 17, 2024Updated last year
- KL3M training data collection and preprocessing☆20Apr 14, 2025Updated 9 months ago
- C inference engine for running GLiClass (Generalist and Lightweight Classification) models☆16May 21, 2025Updated 8 months ago
- A PyMOL plugin with accompanying Docker image for kinase inhibitor binding and affinity prediction☆12Jun 3, 2024Updated last year
- Proof of concept code from Gretel.ai and Illumina using generative neural networks to create synthetic versions of mouse genotype and phe…☆33Jan 19, 2022Updated 4 years ago
- Karpathy's llama2.c transpiled to MLX for Apple Silicon☆14Dec 28, 2023Updated 2 years ago
- MyAssistant Playground --powered by Bedrock Claude & AutoGen☆12Mar 26, 2024Updated last year
- Legalpioneer dataset☆15Apr 10, 2025Updated 10 months ago
- ☆20Jan 10, 2024Updated 2 years ago
- 💙 Unstructured Data Connectors for Haystack 2.0☆17Sep 21, 2023Updated 2 years ago
- How to write integration tests for data pipelines using Great Expectations and pytest.☆15Dec 12, 2018Updated 7 years ago
- This repository serves as a collection of scrapers procuring and structuring various legal datasets☆18Jun 16, 2023Updated 2 years ago
- Explains Canadian Bills☆17May 13, 2023Updated 2 years ago
- DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)☆206Feb 8, 2022Updated 4 years ago
- Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs☆45Jun 21, 2023Updated 2 years ago
- An open-source OpenAI wrapper for a RAG-based chatbot that seamlessly integrates with your documents.☆22Nov 27, 2024Updated last year
- ☆42Dec 7, 2022Updated 3 years ago
- A curated list of awesome resources for creating synthetic data☆44Feb 16, 2022Updated 3 years ago
- In this article, I will present an open-source AI tool for writing grant applications, using Microsoft AutoGen combined with Retrieval-Au…☆22Jul 19, 2025Updated 6 months ago
- Simple Implementation of a Transformer in the new framework MLX by Apple☆19Nov 18, 2024Updated last year
- Flask + HTMX demo of PDF chat with Assistants API☆18Nov 15, 2023Updated 2 years ago
- LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development☆20Jul 24, 2023Updated 2 years ago
- Synthetic Data SDK ✨☆708Jan 13, 2026Updated last month
- A collection of trading settings for the Galileo FX trading robot. These settings are designed to optimize trading strategies across vari…☆13Jan 27, 2025Updated last year
- A CLI in Rust to generate synthetic data for MLX friendly training☆25Jan 13, 2024Updated 2 years ago
- AI Multi-agent system for real-time, adaptive supply chain coordination and optimization leveraging responsive AI clusters.☆35Mar 28, 2024Updated last year
- A reading list on LLM based Synthetic Data Generation 🔥☆1,516Jun 5, 2025Updated 8 months ago
- An open-source Python library for the assessment of utility and privacy performance of any tabular synthetic dataset.☆23Jun 12, 2025Updated 8 months ago
- Python toolbox for multi-omics data mapping and analysis☆26Apr 13, 2023Updated 2 years ago