patrickfleith/datafast

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/patrickfleith/datafast)

patrickfleith / datafast

Synthetic Text Dataset Generation for LLM projects

☆58

Alternatives and similar repositories for datafast

Users that are interested in datafast are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

distil-labs / Distil-PII
View on GitHub
☆34Dec 20, 2025Updated 7 months ago
Shekswess / synthgenai
View on GitHub
SynthGenAI - Package for Generating Synthetic Datasets using LLMs.
☆56Nov 24, 2025Updated 7 months ago
enguard-ai / awesome-ai-guardrails
View on GitHub
A curated list of materials on AI guardrails
☆60Jun 22, 2026Updated 3 weeks ago
rasbt / try-lion-optimizer
View on GitHub
☆14Mar 9, 2023Updated 3 years ago
distil-labs / Distil-expenses
View on GitHub
SLMs for personal expenses summaries
☆23Dec 20, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
PresageLabs / PrediBench
View on GitHub
Making agents bet on polymarket
☆24Oct 15, 2025Updated 9 months ago
AnaBelenBarbero / contrasto_ai
View on GitHub
Centralize and streamline ML/AI lifecycle observability and compliance processes.
☆12Apr 21, 2026Updated 3 months ago
distil-labs / distil-gitara
View on GitHub
GitAra 🦜🎸: A small function-calling git agent you can run locally
☆28Dec 22, 2025Updated 6 months ago
huggingface / ai-blueprint
View on GitHub
A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …
☆66Feb 6, 2025Updated last year
TrevorW-code / fraud
View on GitHub
synthetic data for ml
☆25Jan 30, 2025Updated last year
argilla-io / adept-augmentations
View on GitHub
A Python library aimed at dissecting and augmenting NER training data.
☆60May 11, 2023Updated 3 years ago
freakonometrics / INF7100
View on GitHub
Introduction à la science des données et à l’intelligence artificielle
☆18Jun 3, 2020Updated 6 years ago
patrickfleith / awesome-spacecraft-engineering-datasets
View on GitHub
A list of awesome and diverse datasets related to space vehicle engineering for industry and research.
☆110Sep 30, 2025Updated 9 months ago
MoleculeTransformers / smiles-featurizers
View on GitHub
Extract Molecular SMILES embeddings from language models pre-trained with various objectives architectures.
☆19Nov 9, 2023Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
AnswerDotAI / fastdata
View on GitHub
☆160Dec 2, 2024Updated last year
MinishLab / tokenlearn
View on GitHub
Pre-train Static Word Embeddings
☆108Jun 9, 2026Updated last month
distil-labs / distil-localdoc.py
View on GitHub
SLM assistant for automatic Python documentation
☆18Dec 20, 2025Updated 7 months ago
timothepearce / synda
View on GitHub
A CLI for generating synthetic data
☆43May 14, 2025Updated last year
langfuse / langfuse-terraform-gcp
View on GitHub
🪢 Terraform module to deploy Langfuse on GCP
☆30Jun 2, 2026Updated last month
hovinh / QII
View on GitHub
This is a API to use the Algorithmic Transparency method - Quantitative Input Influence (QII).
☆11Feb 18, 2019Updated 7 years ago
othr-nlp / rage_toolkit
View on GitHub
☆11Sep 27, 2024Updated last year
zafstojano / policy-gradients
View on GitHub
A minimal hackable implementation of policy gradient methods (GRPO, PPO, REINFORCE)
☆16Feb 20, 2026Updated 5 months ago
Knowledgator / GLiClass
View on GitHub
Generalist and Lightweight Model for Text Classification
☆233Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
flipz357 / XPLAINSIM
View on GitHub
A research toolkit for decomposing and explaining text similarity across neural, structured, and symbolic levels.
☆30Apr 11, 2026Updated 3 months ago
doubleshow / superlinked
View on GitHub
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal…
☆12Sep 16, 2024Updated last year
alea-institute / nupunkt
View on GitHub
Next-generation Punkt sentence boundary detection with zero dependencies
☆32Nov 18, 2025Updated 8 months ago
onur-gokyildiz-bhi / tq-kv
View on GitHub
Pure Rust implementation of Google's TurboQuant (ICLR 2026) — KV cache compression for LLMs
☆39Apr 19, 2026Updated 3 months ago
Aswathi-Varma / varivit
View on GitHub
☆15Mar 12, 2024Updated 2 years ago
shmulvad / zero-for-ner
View on GitHub
Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge
☆17Nov 16, 2021Updated 4 years ago
powerapi-ng / joulehunter
View on GitHub
Joulehunter helps you find what part of your code is consuming considerable amounts of energy.
☆11Nov 2, 2022Updated 3 years ago
mkurman / synthlabs
View on GitHub
Create synthetic datasets from scratch using AI-powered generation. Define topics, customize prompts, and generate high-quality reasoning…
☆32Mar 18, 2026Updated 4 months ago
MinishLab / semhash
View on GitHub
Fast Multimodal Semantic Deduplication & Filtering
☆946May 24, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
AxelSorensenDev / Eevee
View on GitHub
An Easy Annotation Tool for Natural Language Processing
☆12May 17, 2024Updated 2 years ago
stephantul / pynife
View on GitHub
Nearly Inference Free Embeddings: make your RAG queries 500x faster
☆80Apr 27, 2026Updated 2 months ago
MantisAI / sieves
View on GitHub
Plug-and-play document AI with zero-shot models.
☆126May 11, 2026Updated 2 months ago
syda-ai / syda
View on GitHub
AI-powered synthetic data generation — structured tables, unstructured documents, multi-provider LLM support, referential integrity, and …
☆98Jul 7, 2026Updated 2 weeks ago
awslabs / nki-autotune
View on GitHub
☆19Updated this week
talkiq / llm-evaluate
View on GitHub
☆10Nov 12, 2024Updated last year
BorisMuzellec / TROT
View on GitHub
A python implementation of discrete optimal transport with a Tsallis entropy regularization.
☆14Oct 23, 2023Updated 2 years ago