rdi-berkeley / awesome-RLVR-boundaryView external linksLinks
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).
☆85Dec 12, 2025Updated 2 months ago
Alternatives and similar repositories for awesome-RLVR-boundary
Users that are interested in awesome-RLVR-boundary are comparing it to the libraries listed below
Sorting:
- ☆19Aug 4, 2025Updated 6 months ago
- The repository contains code for Adaptive Data Optimization☆32Dec 9, 2024Updated last year
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 8 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated last year
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆11Jan 10, 2025Updated last year
- Towards a Mechanistic Understanding of Large Reasoning Models: A Survey of Training, Inference, and Failures☆30Jan 29, 2026Updated 2 weeks ago
- Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""☆29Oct 12, 2025Updated 4 months ago
- ☆13Jun 4, 2024Updated last year
- ☆13Dec 12, 2025Updated 2 months ago
- repo for paper https://arxiv.org/abs/2504.13837☆327Dec 17, 2025Updated last month
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Jul 23, 2025Updated 6 months ago
- Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?☆19Mar 9, 2025Updated 11 months ago
- ☆17Mar 23, 2025Updated 10 months ago
- [ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"☆17Feb 27, 2025Updated 11 months ago
- ✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks☆18Aug 16, 2024Updated last year
- ☆18Aug 19, 2024Updated last year
- ☆16May 25, 2022Updated 3 years ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆32Jun 13, 2024Updated last year
- ☆185Nov 17, 2025Updated 2 months ago
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 5 months ago
- MAM: ModularMulti-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration☆37Jun 25, 2025Updated 7 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆163Jun 25, 2025Updated 7 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- ☆23Jan 17, 2025Updated last year
- ☆22Jan 25, 2023Updated 3 years ago
- An evaluation suite for Retrieval-Augmented Generation (RAG).☆23Apr 26, 2025Updated 9 months ago
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆149Oct 10, 2025Updated 4 months ago
- Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"☆25Dec 12, 2023Updated 2 years ago
- [ICLR 2022] Official Code Repository for "TRGP: TRUST REGION GRADIENT PROJECTION FOR CONTINUAL LEARNING"☆22Oct 5, 2022Updated 3 years ago
- Steering Llama 2 with Contrastive Activation Addition☆209May 23, 2024Updated last year
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)☆26Feb 25, 2025Updated 11 months ago
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆24Oct 8, 2024Updated last year
- An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation☆27Jun 7, 2024Updated last year
- [ICLR2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"☆30Feb 4, 2026Updated last week
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆173Apr 23, 2025Updated 9 months ago
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆24May 1, 2022Updated 3 years ago
- ☆23Mar 31, 2023Updated 2 years ago
- "In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; ICML 2024.☆29Oct 18, 2023Updated 2 years ago