sunblaze-ucb / awesome-RLVR-boundaryLinks

A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).

☆43

Alternatives and similar repositories for awesome-RLVR-boundary

Users that are interested in awesome-RLVR-boundary are comparing it to the libraries listed below

Sorting:

ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆119Updated 6 months ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆83Updated 3 weeks ago
ZhentingWang / DUMP
☆28Updated 4 months ago
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 5 months ago
sail-sg / AnytimeReasoner
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆47Updated 2 months ago
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆66Updated 6 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆125Updated 2 months ago
TIGER-AI-Lab / AceCoder
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]
☆88Updated 5 months ago
Gen-Verse / CURE
[NeurIPS 2025 Spotlight] ReasonFlux-Coder: Open-Source LLM Coders with Co-Evolving Reinforcement Learning
☆122Updated 2 weeks ago
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆61Updated last year
princeton-pli / what-makes-good-rm
[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆36Updated 2 weeks ago
Parallel-Reasoning / APR
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
☆129Updated last month
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆47Updated 7 months ago
LLM360 / Reasoning360
A repo for open research on building large reasoning models
☆105Updated last week
princeton-nlp / Edge-Pruning
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆60Updated last month
Model-GLUE / Model-GLUE
☆16Updated last year
ChnQ / TracingLLM
☆30Updated last year
katiekang1998 / reasoning_generalization
☆33Updated 8 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated last year
peterljq / Parsimonious-Concept-Engineering
PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)
☆40Updated 11 months ago
GuanghaoYe / Emergence-of-Thinking
☆53Updated 7 months ago
sail-sg / Cheating-LLM-Benchmarks
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆83Updated 11 months ago
MingLiiii / Layer_Gradient
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆74Updated 3 months ago
uservan / ThinkPO
☆18Updated 2 months ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆59Updated 9 months ago
Jiuzhouh / Uncertainty-Aware-Language-Agent
This is the official repo for Towards Uncertainty-Aware Language Agent.
☆28Updated last year
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆37Updated last year
yunfeixie233 / ViGaL
☆57Updated 3 months ago
mandyyyyii / east
☆20Updated 2 months ago
Dereck0602 / Awesome_Test_Time_LLMs
☆127Updated 6 months ago