yafuly / TPOLinks

Test-time preferenece optimization (ICML 2025).

☆168

Alternatives and similar repositories for TPO

Users that are interested in TPO are comparing it to the libraries listed below

Sorting:

cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆258Updated 5 months ago
THU-KEG / AdaptThink
☆157Updated 2 weeks ago
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆135Updated 6 months ago
StarDewXXX / O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆93Updated 8 months ago
multimodal-art-projection / LatentCoT-Horizon
📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.
☆247Updated 3 weeks ago
RUC-NLPIR / Tool-Star
🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
☆270Updated last week
lzhxmu / CPPO
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)
☆155Updated last week
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆112Updated 6 months ago
OpenBMB / RLPR
Extrapolating RLVR to General Domains without Verifiers
☆174Updated 2 months ago
GAIR-NLP / LIMR
☆211Updated 8 months ago
InternLM / POLAR
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆158Updated last month
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆178Updated 3 months ago
ruixin31 / Spurious_Rewards
☆333Updated 2 months ago
GeniusHTX / TALE
☆133Updated last month
RyanLiu112 / GenPRM
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆83Updated 4 months ago
RM-R1-UIUC / RM-R1
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆140Updated 3 months ago
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆185Updated 4 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆86Updated 8 months ago
yubol-bobo / Awesome-Multi-Turn-LLMs
This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …
☆125Updated 5 months ago
GAIR-NLP / ToRL
☆300Updated 5 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆82Updated 7 months ago
NVlabs / Tool-N1
☆208Updated 4 months ago
SuperGPQA / SuperGPQA
☆169Updated 5 months ago
LightChen233 / reasoning-boundary
☆68Updated 4 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆130Updated 7 months ago
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆348Updated 3 weeks ago
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆111Updated 9 months ago
sail-sg / sdft
[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".
☆131Updated 11 months ago
dvlab-research / ARPO
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
☆130Updated 4 months ago
MingyuJ666 / Disentangling-Memory-and-Reasoning
[ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.
☆77Updated last month