bespokelabsai / verifiersLinks
Verifiers for LLM Reinforcement Learning
☆55Updated last month
Alternatives and similar repositories for verifiers
Users that are interested in verifiers are comparing it to the libraries listed below
Sorting:
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆15Updated 2 weeks ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- ☆24Updated 8 months ago
- ☆38Updated this week
- ☆64Updated 2 months ago
- ☆49Updated 6 months ago
- ☆30Updated last week
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 3 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆54Updated 8 months ago
- A repository for research on medium sized language models.☆76Updated last year
- ☆27Updated this week
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆78Updated last year
- Simple GRPO scripts and configurations.☆58Updated 3 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆31Updated 2 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆42Updated this week
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆25Updated 2 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆36Updated 5 months ago
- ☆50Updated this week
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆56Updated 5 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆33Updated 8 months ago
- ☆46Updated 2 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆89Updated 2 weeks ago
- Critique-out-Loud Reward Models☆66Updated 7 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- ☆34Updated 11 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆59Updated last month
- ☆79Updated 6 months ago
- ☆41Updated 7 months ago