InternLM/OREAL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/InternLM/OREAL)

InternLM / OREAL

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

☆190

Alternatives and similar repositories for OREAL

Users that are interested in OREAL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Open-Reasoner-Zero / Open-Reasoner-Zero
View on GitHub
Official Repo for Open-Reasoner-Zero
☆2,096Jun 2, 2025Updated last year
PRIME-RL / PRIME
View on GitHub
Scalable RL solution for advanced reasoning of language models
☆1,865Mar 18, 2025Updated last year
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,267Aug 27, 2025Updated 10 months ago
CMU-AIRe / MRT
View on GitHub
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆120Jun 23, 2026Updated 3 weeks ago
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,868Dec 23, 2025Updated 6 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
openpsi-project / ReaLHF
View on GitHub
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
☆335Apr 24, 2025Updated last year
Unakar / Logic-RL
View on GitHub
Reproduce R1 Zero on Logic Puzzle
☆2,453Mar 20, 2025Updated last year
InternLM / InternEvo
View on GitHub
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…
☆421Aug 21, 2025Updated 11 months ago
RUCAIBox / JiuZhang3.0
View on GitHub
The code and data for the paper JiuZhang3.0
☆49May 26, 2024Updated 2 years ago
krystalan / DRT
View on GitHub
Deep Reasoning Translation (DRT) Project
☆242Sep 1, 2025Updated 10 months ago
InternLM / POLAR
View on GitHub
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆166Sep 23, 2025Updated 9 months ago
RUCAIBox / Slow_Thinking_with_LLMs
View on GitHub
A series of technical report on Slow Thinking with LLM
☆767Aug 13, 2025Updated 11 months ago
InternLM / Agent-FLAN
View on GitHub
[ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
☆361Mar 22, 2024Updated 2 years ago
GAIR-NLP / O1-Journey
View on GitHub
O1 Replication Journey
☆2,000Jan 14, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
PRIME-RL / ImplicitPRM
View on GitHub
Repo of paper "Free Process Rewards without Process Labels"
☆172Mar 14, 2025Updated last year
Vance0124 / Token-level-Direct-Preference-Optimization
View on GitHub
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆156Feb 14, 2025Updated last year
BytedTsinghua-SIA / DAPO
View on GitHub
An Open-source RL System from ByteDance Seed and Tsinghua AIR
☆1,840May 11, 2025Updated last year
Open-Source-O1 / Open-O1
View on GitHub
☆1,340Nov 21, 2024Updated last year
ByteDance-Seed / Seed-Thinking-v1.5
View on GitHub
☆810Jun 9, 2025Updated last year
GAIR-NLP / LIMR
View on GitHub
☆220Feb 20, 2025Updated last year
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,708Updated this week
LengSicong / MMR1
View on GitHub
[CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
☆217Sep 26, 2025Updated 9 months ago
open-compass / ANAH
View on GitHub
[ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2 & [ICLR 2025] Mask-DPO
☆66Apr 30, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ltzheng / SimpleTIR
View on GitHub
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆401Mar 30, 2026Updated 3 months ago
davidbrandfonbrener / color-filter-olmo
View on GitHub
☆13Dec 12, 2025Updated 7 months ago
NineAbyss / S2R
View on GitHub
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆76Apr 22, 2025Updated last year
chujiezheng / LLM-Extrapolation
View on GitHub
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75May 20, 2025Updated last year
areal-project / AReaL
View on GitHub
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
☆5,575Updated this week
huggingface / Math-Verify
View on GitHub
☆1,170Jan 10, 2026Updated 6 months ago
Qihoo360 / Light-R1
View on GitHub
☆764Dec 23, 2025Updated 6 months ago
hamishivi / automated-instruction-selection
View on GitHub
Exploration of automated dataset selection approaches at large scales.
☆55Mar 4, 2025Updated last year
X2FD / LVIS-INSTRUCT4V
View on GitHub
☆134Dec 22, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
eddycmu / demystify-long-cot
View on GitHub
☆336May 31, 2025Updated last year
SkyworkAI / Skywork-OR1
View on GitHub
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
☆739Jun 6, 2025Updated last year
ChenxinAn-fdu / POLARIS
View on GitHub
Scaling RL on advanced reasoning models
☆691Oct 20, 2025Updated 9 months ago
HKUNLP / critic-rl
View on GitHub
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆126May 6, 2025Updated last year
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
GAIR-NLP / OctoThinker
View on GitHub
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆189Jul 23, 2025Updated 11 months ago
THUDM / T1
View on GitHub
RL Scaling and Test-Time Scaling (ICML'25)
☆116Jan 23, 2025Updated last year