willccbb / verifiersLinks
Verifiers for LLM Reinforcement Learning
☆1,780Updated this week
Alternatives and similar repositories for verifiers
Users that are interested in verifiers are comparing it to the libraries listed below
Sorting:
- procedural reasoning datasets☆1,060Updated last week
- Recipes to scale inference-time compute of open models☆1,112Updated 3 months ago
- Synthetic data curation for post-training and structured data extraction☆1,483Updated 3 weeks ago
- Training Large Language Model to Reason in a Continuous Latent Space☆1,249Updated last week
- ☆1,033Updated 8 months ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆2,236Updated 2 weeks ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,841Updated this week
- System 2 Reasoning Link Collection☆852Updated 5 months ago
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆591Updated this week
- A bibliography and survey of the papers surrounding o1☆1,207Updated 9 months ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆738Updated this week
- Code and Data for Tau-Bench☆779Updated last month
- [COLM 2025] LIMO: Less is More for Reasoning☆1,006Updated 3 weeks ago
- Textbook on reinforcement learning from human feedback☆1,185Updated last week
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆872Updated last week
- Democratizing Reinforcement Learning for LLMs☆4,043Updated this week
- Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL☆3,052Updated last week
- AllenAI's post-training codebase☆3,124Updated this week
- Automatic evals for LLMs☆519Updated last month
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆546Updated 2 weeks ago
- Fully open data curation for reasoning models☆2,044Updated last month
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,068Updated last month
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,852Updated last week
- Pretraining and inference code for a large-scale depth-recurrent language model☆816Updated last month
- Minimalistic large language model 3D-parallelism training☆2,150Updated last month
- ☆621Updated last month
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,537Updated 4 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆526Updated 3 weeks ago
- Scalable RL solution for advanced reasoning of language models☆1,685Updated 5 months ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆520Updated last month