A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).
☆89Dec 12, 2025Updated 6 months ago
Alternatives and similar repositories for awesome-RLVR-boundary
Users that are interested in awesome-RLVR-boundary are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19Aug 4, 2025Updated 10 months ago
- Towards a Mechanistic Understanding of Large Reasoning Models: A Survey of Training, Inference, and Failures☆33Jan 29, 2026Updated 4 months ago
- Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?☆20Mar 9, 2025Updated last year
- ☆16May 25, 2022Updated 4 years ago
- repo for paper https://arxiv.org/abs/2504.13837☆341Dec 17, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated 2 years ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆32Jun 5, 2025Updated last year
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆188Jul 23, 2025Updated 10 months ago
- Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""☆35Oct 12, 2025Updated 8 months ago
- ☆18Mar 23, 2025Updated last year
- [ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"☆16Feb 27, 2025Updated last year
- Tools for optimizing steering vectors in LLMs.☆22Apr 10, 2025Updated last year
- Aioli: A unified optimization framework for language model data mixing☆32Jan 17, 2025Updated last year
- ☆189Apr 22, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An evaluation suite for Retrieval-Augmented Generation (RAG).☆24Apr 26, 2025Updated last year
- [ML4H'25] MedVLThinker: Simple Baselines for Multimodal Medical Reasoning☆59Dec 21, 2025Updated 5 months ago
- [ICLR 2022] Official Code Repository for "TRGP: TRUST REGION GRADIENT PROJECTION FOR CONTINUAL LEARNING"☆22Oct 5, 2022Updated 3 years ago
- Codes for Difflare: Removing Image Flare with Latent Diffusion Models☆11Dec 24, 2024Updated last year
- Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals☆11Jan 8, 2026Updated 5 months ago
- [ICLR2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"☆30Feb 4, 2026Updated 4 months ago
- ☆170Aug 27, 2025Updated 9 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆198Mar 12, 2026Updated 3 months ago
- ☆13Dec 12, 2025Updated 6 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆27Nov 20, 2024Updated last year
- ☆13Jun 4, 2024Updated 2 years ago
- MAM: ModularMulti-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration☆52Apr 3, 2026Updated 2 months ago
- Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"☆25Dec 12, 2023Updated 2 years ago
- ☆32Aug 9, 2024Updated last year
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)☆27Feb 25, 2025Updated last year
- ☆33Jun 24, 2024Updated last year
- source code for NeurIPS'22 paper "SIREN: Shaping Representations for Detecting Out-of-Distribution Objects"☆34May 13, 2023Updated 3 years ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attrib…☆39Jul 14, 2025Updated 11 months ago
- code for the paper Offline Prioritized Experience Replay☆12Jun 13, 2023Updated 3 years ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆29Dec 19, 2023Updated 2 years ago
- AlignX-Family is an open-source research suite for advancing personalization in large language models-spanning data, code, models, and be…☆20Jan 12, 2026Updated 5 months ago
- SRTK: Retrieve semantic-relevant subgraphs from large-scale knowledge graphs☆33Sep 22, 2024Updated last year
- ☆21Apr 3, 2026Updated 2 months ago
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆23May 1, 2022Updated 4 years ago