liushulinle/UloRL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/liushulinle/UloRL)

liushulinle / UloRL

An Ultra-Long Output Reinforcement Learning Approach

☆23

Alternatives and similar repositories for UloRL

Users that are interested in UloRL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hkust-nlp / RL-Verifier-Robustness
View on GitHub
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
☆24Oct 7, 2025Updated 9 months ago
SWE-Gym / SWE-Bench-Fork
View on GitHub
☆13Mar 5, 2025Updated last year
wdlctc / mini-s
View on GitHub
☆51Oct 29, 2024Updated last year
Kwai-Klear / CE-GPPO
View on GitHub
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
☆16Jan 23, 2026Updated 6 months ago
THUDM / TreeRL
View on GitHub
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
☆99Jun 16, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
RUCAIBox / Passk_Training
View on GitHub
The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
☆113Aug 15, 2025Updated 11 months ago
lukahhcm / Awesome_Environment_Scaling
View on GitHub
Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …
☆72Jan 28, 2026Updated 6 months ago
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
☆64Apr 11, 2026Updated 3 months ago
suu990901 / KlearReasoner
View on GitHub
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆82Dec 25, 2025Updated 7 months ago
MasterVito / SwS
View on GitHub
Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoning
☆42Nov 11, 2025Updated 8 months ago
zhang677 / PCL-lite
View on GitHub
[ICML 2025] Adaptive Self-improvement LLM Agentic System for ML Library Development
☆17Jan 6, 2026Updated 6 months ago
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
Tencent-Hunyuan / C3-Benchmark
View on GitHub
C^3-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking
☆38Mar 1, 2026Updated 4 months ago
T-Lab-CUHKSZ / G2RPO-A
View on GitHub
[ACL 2026] G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
☆16May 20, 2026Updated 2 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ArminAzizi98 / LaMDA
View on GitHub
☆15Nov 7, 2024Updated last year
multimodal-art-projection / TreePO
View on GitHub
☆65Mar 30, 2026Updated 3 months ago
royeisen / reasoning_loading_bar
View on GitHub
☆56Jul 7, 2025Updated last year
ZJU-REAL / HBPO
View on GitHub
☆34Aug 11, 2025Updated 11 months ago
zjunlp / KnowRL
View on GitHub
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
☆48May 19, 2026Updated 2 months ago
MindLab-Research / longstraw
View on GitHub
MinT-2M: Long-context training system for resident-prefix GRPO
☆39Updated this week
Infini-AI-Lab / GRESO
View on GitHub
☆82Jun 8, 2026Updated last month
wizard-III / ArcherCodeR
View on GitHub
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement …
☆44Aug 6, 2025Updated 11 months ago
assafbk / OPRM
View on GitHub
Overflow Prevention Enhances Long-Context Recurrent LLMs (COLM 2025)
☆18Jul 8, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
bbartoldson / TBA
View on GitHub
Official implementation of TBA for async LLM post-training.
☆32Nov 5, 2025Updated 8 months ago
zongqianwu / ST-COT
View on GitHub
(ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training
☆13Feb 15, 2025Updated last year
ypw0102 / BatchEval
View on GitHub
code for ACL2024-main: BatchEval: Towards Human-like Text Evaluation
☆19May 20, 2024Updated 2 years ago
amazon-science / Self-Aligned-Reward-Towards_Effective_and_Efficient_Reasoners
View on GitHub
☆21Apr 21, 2026Updated 3 months ago
Qwen-Applications / GD2PO
View on GitHub
☆20Jun 16, 2026Updated last month
kaiwenzha / RL-Tango
View on GitHub
[NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
☆57Oct 23, 2025Updated 9 months ago
InternScience / MME-Reasoning
View on GitHub
Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs
☆45Jun 17, 2025Updated last year
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
HJYao00 / MMReason
View on GitHub
[ICCV 2025] MMReason, MLLMs, step by step, reasoning benchmark, AGI
☆15Apr 25, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ZihaoHuang-notabot / Ultra-Sparse-Memory-Network
View on GitHub
☆48Jul 3, 2026Updated 3 weeks ago
Leey21 / A-Data-Centric-Study
View on GitHub
☆18Mar 2, 2026Updated 4 months ago
EPFL-IMOS / TrustVLM
View on GitHub
To Trust Or Not To Trust Your Vision-Language Model's Prediction
☆15May 30, 2025Updated last year
seamoke / DPH-RL
View on GitHub
This is the official implementation of paper "The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement…
☆20Feb 10, 2026Updated 5 months ago
Elvin-Yiming-Du / Memory-T1
View on GitHub
This respository is used for time reasoning task for mult-session dialogue system.
☆17Feb 7, 2026Updated 5 months ago
sail-sg / VeriFree
View on GitHub
Reinforcing General Reasoning without Verifiers
☆102Jun 24, 2025Updated last year
sailing-lab / sr2am
View on GitHub
SR²AM: Efficient Agentic Reasoning Through Self-Regulated Simulative Planning
☆21May 22, 2026Updated 2 months ago