bespokelabsai/curator

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bespokelabsai/curator)

bespokelabsai / curator

Synthetic data curation for post-training and structured data extraction

☆1,638

Alternatives and similar repositories for curator

Users that are interested in curator are comparing it to the libraries listed below

Sorting:

mlfoundations / evalchemy
View on GitHub
Automatic evals for LLMs
☆581Feb 24, 2026Updated last week
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,108Feb 23, 2026Updated last week
open-thoughts / open-thoughts
View on GitHub
Fully open data curation for reasoning models
☆2,218Dec 2, 2025Updated 3 months ago
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆6,826Updated this week
huggingface / lighteval
View on GitHub
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆2,314Feb 20, 2026Updated 2 weeks ago
NovaSky-AI / SkyThought
View on GitHub
Sky-T1: Train your own O1 preview model within $450
☆3,369Jul 12, 2025Updated 7 months ago
axolotl-ai-cloud / axolotl
View on GitHub
Go ahead and axolotl questions
☆11,395Updated this week
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,167Feb 27, 2026Updated last week
datadreamer-dev / DataDreamer
View on GitHub
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
☆1,097Feb 2, 2025Updated last year
argilla-io / synthetic-data-generator
View on GitHub
Build datasets using natural language
☆568Sep 19, 2025Updated 5 months ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,915Updated this week
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆13,488Updated this week
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆23,905Updated this week
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,605Updated this week
Jiayi-Pan / TinyZero
View on GitHub
Minimal reproduction of DeepSeek R1-Zero
☆12,853Feb 27, 2026Updated last week
algorithmicsuperintelligence / optillm
View on GitHub
Optimizing inference proxy for LLMs
☆3,352Jan 28, 2026Updated last month
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,579Feb 19, 2026Updated 2 weeks ago
verl-project / verl
View on GitHub
verl: Volcano Engine Reinforcement Learning for LLMs
☆19,519Updated this week
PrimeIntellect-ai / verifiers
View on GitHub
Our library for RL environments + evals
☆3,869Feb 28, 2026Updated last week
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,830Dec 23, 2025Updated 2 months ago
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆32,519Updated this week
huggingface / Math-Verify
View on GitHub
☆1,104Jan 10, 2026Updated last month
huggingface / open-r1
View on GitHub
Fully open reproduction of DeepSeek-R1
☆25,910Nov 24, 2025Updated 3 months ago
ucbepic / docetl
View on GitHub
A system for agentic LLM-powered data processing and ETL
☆3,669Feb 2, 2026Updated last month
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,510Sep 8, 2025Updated 5 months ago
wasiahmad / Awesome-LLM-Synthetic-Data
View on GitHub
A reading list on LLM based Synthetic Data Generation 🔥
☆1,520Jun 5, 2025Updated 9 months ago
MinishLab / model2vec
View on GitHub
Fast State-of-the-Art Static Embeddings
☆2,007Feb 28, 2026Updated last week
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆11,540Updated this week
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆17,523Updated this week
facebookresearch / coconut
View on GitHub
Training Large Language Model to Reason in a Continuous Latent Space
☆1,522Aug 12, 2025Updated 6 months ago
mlabonne / llm-datasets
View on GitHub
Curated list of datasets and tools for post-training.
☆4,274Nov 10, 2025Updated 3 months ago
AnswerDotAI / RAGatouille
View on GitHub
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…
☆3,868May 17, 2025Updated 9 months ago
open-thought / reasoning-gym
View on GitHub
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
☆1,352Jan 16, 2026Updated last month
huggingface / smollm
View on GitHub
Everything about the SmolLM and SmolVLM family of models
☆3,636Jan 13, 2026Updated last month
PrimeIntellect-ai / genesys
View on GitHub
☆137Mar 20, 2025Updated 11 months ago
simplescaling / s1
View on GitHub
s1: Simple test-time scaling
☆6,636Jun 25, 2025Updated 8 months ago
Open-Reasoner-Zero / Open-Reasoner-Zero
View on GitHub
Official Repo for Open-Reasoner-Zero
☆2,087Jun 2, 2025Updated 9 months ago
argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆4,884Updated this week
meta-llama / synthetic-data-kit
View on GitHub
Tool for generating high quality Synthetic datasets
☆1,518Oct 28, 2025Updated 4 months ago