willccbb / verifiers
Verifiers for LLM Reinforcement Learning
☆727Updated last week
Alternatives and similar repositories for verifiers:
Users that are interested in verifiers are comparing it to the libraries listed below
- procedural reasoning datasets☆541Updated this week
- Recipes to scale inference-time compute of open models☆1,048Updated last month
- Synthetic data curation for post-training and structured data extraction☆1,097Updated this week
- System 2 Reasoning Link Collection☆812Updated 2 weeks ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆403Updated 2 weeks ago
- Build your own visual reasoning model☆320Updated this week
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆425Updated 6 months ago
- Training Large Language Model to Reason in a Continuous Latent Space☆1,015Updated 2 months ago
- ☆1,011Updated 3 months ago
- A bibliography and survey of the papers surrounding o1☆1,183Updated 4 months ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆1,265Updated this week
- ☆913Updated 2 months ago
- ☆574Updated 2 weeks ago
- Automatic evals for LLMs☆346Updated this week
- Pretraining code for a large-scale depth-recurrent language model☆709Updated 2 weeks ago
- An Open Source Toolkit For LLM Distillation☆554Updated 2 months ago
- Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL☆1,466Updated this week
- ☆493Updated this week
- LIMO: Less is More for Reasoning☆875Updated last month
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,336Updated this week
- Understanding R1-Zero-Like Training: A Critical Perspective☆725Updated this week
- ☆526Updated this week
- Large Reasoning Models☆800Updated 3 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆656Updated 2 months ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆459Updated this week
- ☆504Updated 4 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆300Updated 4 months ago
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆477Updated 2 weeks ago
- Code and Data for Tau-Bench☆358Updated 2 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆311Updated 3 months ago