sunblaze-ucb / awesome-RLVR-boundaryLinks
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).
☆43Updated this week
Alternatives and similar repositories for awesome-RLVR-boundary
Users that are interested in awesome-RLVR-boundary are comparing it to the libraries listed below
Sorting:
- Code for "Reasoning to Learn from Latent Thoughts"☆119Updated 6 months ago
- A Sober Look at Language Model Reasoning☆83Updated 3 weeks ago
- ☆28Updated 4 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 5 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆47Updated 2 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆66Updated 6 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆125Updated 2 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆88Updated 5 months ago
- [NeurIPS 2025 Spotlight] ReasonFlux-Coder: Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆122Updated 2 weeks ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆61Updated last year
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆36Updated 2 weeks ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆129Updated last month
- Exploration of automated dataset selection approaches at large scales.☆47Updated 7 months ago
- A repo for open research on building large reasoning models☆105Updated last week
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆60Updated last month
- ☆16Updated last year
- ☆30Updated last year
- ☆33Updated 8 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆123Updated last year
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆40Updated 11 months ago
- ☆53Updated 7 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆83Updated 11 months ago
- [ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆74Updated 3 months ago
- ☆18Updated 2 months ago
- [2025-TMLR] A Survey on the Honesty of Large Language Models☆59Updated 9 months ago
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆28Updated last year
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆37Updated last year
- ☆57Updated 3 months ago
- ☆20Updated 2 months ago
- ☆127Updated 6 months ago