smiles724/Awesome-LLM-RLVR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/smiles724/Awesome-LLM-RLVR)

smiles724 / Awesome-LLM-RLVR

Collection of latest papers and materials in the area of RLVR!

☆136

Alternatives and similar repositories for Awesome-LLM-RLVR

Users that are interested in Awesome-LLM-RLVR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

smiles724 / MNPO
View on GitHub
The official code of Multi-player Nash Preference Optimization [ICLR 2026]
☆35Feb 4, 2026Updated 5 months ago
chq1155 / RL-PLM
View on GitHub
Official implementation of From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?
☆15Jun 19, 2026Updated last month
TsinghuaC3I / Awesome-RL-for-LRMs
View on GitHub
A Survey of Reinforcement Learning for Large Reasoning Models
☆2,470Nov 9, 2025Updated 8 months ago
QingyangZhang / Label-Free-RLVR
View on GitHub
☆311Jul 6, 2025Updated last year
junkangwu / QAE
View on GitHub
[ICLR 2026] Quantile Advantage Estimation for Entropy-Safe Reasoning
☆29Oct 14, 2025Updated 9 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
SagnikMukherjee / sparsity_in_rl
View on GitHub
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
☆15Oct 20, 2025Updated 9 months ago
jerry3027 / PolyIE
View on GitHub
☆17Jan 26, 2024Updated 2 years ago
ritaranx / BMRetriever
View on GitHub
[EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".
☆26Sep 19, 2024Updated last year
wshi83 / MedAdapter
View on GitHub
[EMNLP'24] MedAdapter: Efficient Test-Time Adaptation of Large Language Models Towards Medical Reasoning
☆36Dec 26, 2024Updated last year
aaronserianni / attention-iou
View on GitHub
[CVPR'25] Attention IoU: Examining Biases in CelebA using Attention Maps
☆13Mar 26, 2025Updated last year
Mr-Loevan / DPO-Survey
View on GitHub
[TPAMI 2026] A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
☆16Jun 9, 2026Updated last month
avalonstrel / Mitigating-the-Alignment-Tax-of-RLHF
View on GitHub
☆16Feb 8, 2024Updated 2 years ago
ltpo2025 / LTPO
View on GitHub
[ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
☆32Mar 6, 2026Updated 4 months ago
yuelinan / Awesome-Efficient-R1-style-LRMs
View on GitHub
☆53Jul 12, 2026Updated 2 weeks ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kanishkg / endless-terminals
View on GitHub
☆134Mar 31, 2026Updated 3 months ago
wizard-III / Archer2.0
View on GitHub
Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…
☆31Oct 10, 2025Updated 9 months ago
siyan-zhao / OPSD
View on GitHub
☆515May 10, 2026Updated 2 months ago
yubol-bobo / Awesome-Multi-Turn-LLMs
View on GitHub
This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …
☆201Jul 11, 2026Updated 2 weeks ago
sahilkhose / CS224N
View on GitHub
Solutions for Stanford CS224n, Winter 2020.
☆11Jun 5, 2021Updated 5 years ago
mansicer / self-verification
View on GitHub
☆18Dec 23, 2025Updated 7 months ago
lqtrung1998 / mwp_cot_design
View on GitHub
☆14Oct 11, 2023Updated 2 years ago
TraceElephant / TraceElephant
View on GitHub
Repo of "Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems" (ACL 2026)
☆16Apr 27, 2026Updated 3 months ago
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,437May 11, 2026Updated 2 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
JinhaoLee / WCA
View on GitHub
[ICML 2024] Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
☆19Mar 23, 2026Updated 4 months ago
smiles724 / Rel-LLM
View on GitHub
☆31Sep 23, 2025Updated 10 months ago
The-Martyr / Awesome-Multimodal-Reasoning
View on GitHub
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal LLMs
☆83Updated this week
intervention-training / int
View on GitHub
☆16Feb 4, 2026Updated 5 months ago
seuer123 / 408
View on GitHub
☆19Nov 11, 2024Updated last year
Eclipsess / Awesome-Efficient-Reasoning-LLMs
View on GitHub
[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
☆786Feb 28, 2026Updated 5 months ago
ritaranx / Collab-RAG
View on GitHub
☆30Apr 8, 2025Updated last year
ritaranx / ClinGen
View on GitHub
[ACL 2024 Findings] This is the code for our paper "Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation wi…
☆43Jun 23, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
rdi-berkeley / awesome-RLVR-boundary
View on GitHub
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…
☆89Dec 12, 2025Updated 7 months ago
ritaranx / AceSearcher
View on GitHub
This is the code repo for the paper AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play (NeurIPS 2025 Spotl…
☆25Sep 29, 2025Updated 10 months ago
ustc-time-series / CastClaw
View on GitHub
☆45Jul 17, 2026Updated last week
yingyingxia666 / awesome-agentic
View on GitHub
A curated reading list of large-language-model RL papers, organized by four research directions: Reasoning RL, Agentic RL, OPD (Off-Polic…
☆25Jul 17, 2026Updated last week
LTS5 / ReservoirTTA
View on GitHub
[preprint] ReservoirTTA: Prolonged Test-time Adaptation for Evolving and Recurring Domains
☆15Aug 20, 2025Updated 11 months ago
yuanqing-ai / LLM-Hierarchical-Consistency
View on GitHub
Official implementation of "Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck" [CVPR'26]
☆16Nov 10, 2025Updated 8 months ago
chicosirius / think-or-not
View on GitHub
☆22May 23, 2025Updated last year