argilla-io/distilabel

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/argilla-io/distilabel)

argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

☆3,344

Alternatives and similar repositories for distilabel

Users that are interested in distilabel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,056Jul 20, 2026Updated last week
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,261Jun 17, 2026Updated last month
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,645May 26, 2026Updated 2 months ago
axolotl-ai-cloud / axolotl
View on GitHub
Go ahead and axolotl questions
☆12,255Updated this week
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,227Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
huggingface / lighteval
View on GitHub
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆2,498Jun 29, 2026Updated 3 weeks ago
huggingface / open-r1
View on GitHub
Fully open reproduction of DeepSeek-R1
☆26,415Apr 2, 2026Updated 3 months ago
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,934Updated this week
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,767May 26, 2026Updated 2 months ago
argilla-io / synthetic-data-generator
View on GitHub
Build datasets using natural language
☆587Sep 19, 2025Updated 10 months ago
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆15,350Updated this week
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,415Jul 13, 2026Updated 2 weeks ago
bespokelabsai / curator
View on GitHub
Synthetic data curation for post-training and structured data extraction
☆1,704Updated this week
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,388Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
datadreamer-dev / DataDreamer
View on GitHub
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
☆1,115Feb 2, 2025Updated last year
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,882Mar 21, 2026Updated 4 months ago
mlabonne / llm-datasets
View on GitHub
Curated list of datasets and tools for post-training.
☆4,710Apr 29, 2026Updated 2 months ago
huggingface / cosmopedia
View on GitHub
☆572Nov 20, 2024Updated last year
magpie-align / magpie
View on GitHub
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆874Mar 17, 2025Updated last year
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,810Updated this week
databricks / lilac
View on GitHub
Curate better data for LLMs
☆1,072Mar 19, 2024Updated 2 years ago
AnswerDotAI / RAGatouille
View on GitHub
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…
☆3,942May 17, 2025Updated last year
Lightning-AI / litgpt
View on GitHub
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
☆13,564Jul 20, 2026Updated last week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
meta-pytorch / torchtune
View on GitHub
PyTorch native post-training library
☆5,787Updated this week
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.
☆68,918Updated this week
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,756Updated this week
predibase / lorax
View on GitHub
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆3,820May 28, 2026Updated last month
linkedin / Liger-Kernel
View on GitHub
Efficient Triton Kernels for LLM Training
☆6,535Updated this week
wasiahmad / Awesome-LLM-Synthetic-Data
View on GitHub
A reading list on LLM based Synthetic Data Generation 🔥
☆1,545Jun 5, 2025Updated last year
guidance-ai / guidance
View on GitHub
A guidance language for controlling large language models.
☆21,694May 21, 2026Updated 2 months ago
AnswerDotAI / rerankers
View on GitHub
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
☆1,626Dec 20, 2025Updated 7 months ago
mlabonne / llm-autoeval
View on GitHub
Automatically evaluate your LLMs in Google Colab
☆695May 7, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
davanstrien / awesome-synthetic-datasets
View on GitHub
awesome synthetic (text) datasets
☆335Jan 8, 2026Updated 6 months ago
arcee-ai / DistillKit
View on GitHub
An Open Source Toolkit For LLM Distillation
☆990May 12, 2026Updated 2 months ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,667Updated this week
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,732Updated this week
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆21,453Updated this week
MinishLab / semhash
View on GitHub
Fast Multimodal Semantic Deduplication & Filtering
☆954May 24, 2026Updated 2 months ago
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,777May 26, 2026Updated 2 months ago