Zhiyuan-Zeng/RLVE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Zhiyuan-Zeng/RLVE)

Zhiyuan-Zeng / RLVE

[ICML 2026] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

☆226

Alternatives and similar repositories for RLVE

Users that are interested in RLVE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

axon-rl / gem
View on GitHub
A Gym for Agentic LLMs
☆503Jan 21, 2026Updated 6 months ago
PRIME-RL / RL-Compositionality
View on GitHub
FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
☆68Jan 26, 2026Updated 6 months ago
ypwang61 / ThetaEvolve
View on GitHub
ThetaEvolve: Test-time Learning on Open Problems, enabling RL training on AlphaEvolve/OpenEvolve and emphasizing scaling test-time comput…
☆172Feb 27, 2026Updated 5 months ago
hkust-nlp / model-task-align-rl
View on GitHub
[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆18Feb 9, 2026Updated 5 months ago
NovaSky-AI / SkyRL
View on GitHub
SkyRL: A Modular Full-stack RL Library for LLMs
☆2,102Updated this week
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
sail-sg / oat
View on GitHub
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆666Jan 29, 2026Updated 6 months ago
ServiceNow / PipelineRL
View on GitHub
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆430Jul 20, 2026Updated last week
RefineBench / refinebench-eval
View on GitHub
Official code and dataset for our paper: RefineBench: Evaluating Refinement Capability of Language Models via Checklists
☆17Dec 1, 2025Updated 7 months ago
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
radixark / miles
View on GitHub
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
☆1,805Updated this week
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated last year
SalesforceAIResearch / LaTRO
View on GitHub
☆127Jun 2, 2026Updated last month
lasgroup / SDPO
View on GitHub
Reinforcement Learning via Self-Distillation (SDPO)
☆1,027Jul 1, 2026Updated 3 weeks ago
AQ-MedAI / MrlX
View on GitHub
MrlX: A Multi-Agent Reinforcement Learning Framework
☆215Jan 19, 2026Updated 6 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
xiye17 / TextualExplInContext
View on GitHub
The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)
☆16Feb 11, 2023Updated 3 years ago
zorazrw / agent-skill-induction
View on GitHub
Agent Skill Induction: "Inducing Programmatic Skills for Agentic Tasks"
☆42Apr 24, 2025Updated last year
open-thought / reasoning-gym
View on GitHub
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
☆1,469Apr 17, 2026Updated 3 months ago
Interplay-LM-Reasoning / Interplay-LM-Reasoning
View on GitHub
[ICML 2026 Spotlight] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
☆164Jun 8, 2026Updated last month
ypwang61 / One-Shot-RLVR
View on GitHub
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆444Mar 11, 2026Updated 4 months ago
test-time-training / discover
View on GitHub
☆611May 24, 2026Updated 2 months ago
sunblaze-ucb / omega
View on GitHub
☆47Jun 24, 2025Updated last year
ltzheng / SimpleTIR
View on GitHub
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆401Mar 30, 2026Updated 3 months ago
RyanLiu112 / GenPRM
View on GitHub
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆102Nov 8, 2025Updated 8 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
thinking-machines-lab / tinker-project-ideas
View on GitHub
Ideas for projects related to Tinker
☆192Nov 6, 2025Updated 8 months ago
PRIME-RL / Entropy-Mechanism-of-RL
View on GitHub
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆446Jul 11, 2025Updated last year
stellalisy / PrefPalette
View on GitHub
☆21Apr 3, 2026Updated 3 months ago
hkust-nlp / B-STaR
View on GitHub
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86May 21, 2025Updated last year
open-tinker / OpenTinker
View on GitHub
OpenTinker is an RL-as-a-Service infrastructure for foundation models
☆677Mar 21, 2026Updated 4 months ago
zhengkid / Parallel-R1
View on GitHub
The offical repo for "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning"
☆260Feb 4, 2026Updated 5 months ago
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,679Updated this week
mukhal / ThinkPRM
View on GitHub
[TMLR] Process Reward Models That Think
☆90Nov 29, 2025Updated 8 months ago
PRIME-RL / PRIME
View on GitHub
Scalable RL solution for advanced reasoning of language models
☆1,866Mar 18, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alestolfo / causal-math
View on GitHub
Code Repository for "A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models".
☆15Oct 14, 2022Updated 3 years ago
IcyFish332 / T3RL
View on GitHub
☆48Apr 15, 2026Updated 3 months ago
McGill-NLP / the-markovian-thinker
View on GitHub
Code for paper "The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning"
☆349Mar 16, 2026Updated 4 months ago
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆461Mar 20, 2026Updated 4 months ago
cmavro / PackLLM
View on GitHub
Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization
☆15Apr 25, 2024Updated 2 years ago
PrimeIntellect-ai / prime-rl
View on GitHub
Agentic RL Training at Scale
☆1,759Updated this week
holarissun / RewardModelingBeyondBradleyTerry
View on GitHub
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆73Apr 2, 2025Updated last year