DAMO-NLP-SG / LongPOLinks

[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

☆41

Alternatives and similar repositories for LongPO

Users that are interested in LongPO are comparing it to the libraries listed below

Sorting:

SihengLi99 / SEALONG
Large Language Models Can Self-Improve in Long-context Reasoning
☆73Updated 11 months ago
RM-R1-UIUC / RM-R1
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆145Updated 4 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆83Updated 7 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆114Updated 5 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆88Updated last year
inclusionAI / PromptCoT
A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…
☆119Updated last month
TIGER-AI-Lab / General-Reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆186Updated 4 months ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆177Updated 3 months ago
ReasoningTransfer / Transferability-of-LLM-Reasoning
☆99Updated 2 weeks ago
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆112Updated 6 months ago
efficientscaling / Z1
[EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"
☆65Updated 6 months ago
open-compass / CriticEval
[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs
☆47Updated 10 months ago
MingLiiii / Layer_Gradient
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆75Updated 4 months ago
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆111Updated 9 months ago
ByteDance-Seed / WideSearch
WideSearch: Benchmarking Agentic Broad Info-Seeking
☆96Updated 2 weeks ago
TIGER-AI-Lab / AceCoder
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]
☆91Updated 6 months ago
RUCAIBox / JiuZhang3.0
The code and data for the paper JiuZhang3.0
☆49Updated last year
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆51Updated 4 months ago
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆30Updated 2 months ago
yyDing1 / ScaleQuest
[ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆68Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆36Updated last year
wwxu21 / CUT
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Updated last year
chenllliang / MMEvalPro
[NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
☆24Updated last year
Quehry / HelloBench
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
☆52Updated 11 months ago
bobxwu / learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…
☆57Updated 4 months ago
mathllm / Step-Controlled_DPO
☆23Updated last year
clinicalml / co-llm
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆122Updated last year
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 5 months ago
nick7nlp / FastCuRL
FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning
☆53Updated 2 weeks ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 3 months ago