Optimization-AI / DisCOLinks

Discriminative Constrained Optimization for Reinforcing Large Reasoning Models

☆49

Alternatives and similar repositories for DisCO

Users that are interested in DisCO are comparing it to the libraries listed below

Sorting:

bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆92Updated last month
TianHongZXY / RLVR-Decomposed
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆146Updated 2 months ago
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆72Updated 8 months ago
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆124Updated 9 months ago
ruixin31 / Spurious_Rewards
☆346Updated 5 months ago
limenlp / verl
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
☆50Updated 6 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆70Updated 5 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆134Updated 9 months ago
sail-sg / AnytimeReasoner
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆51Updated 5 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆168Updated 9 months ago
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆38Updated last year
Joshua-Ren / Learning_dynamics_LLM
☆201Updated 2 weeks ago
ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆94Updated last year
hkust-nlp / Laser
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆62Updated 7 months ago
bigai-nlco / LatentSeek
Official Repository of LatentSeek
☆73Updated 7 months ago
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆50Updated last year
kanishkg / cognitive-behaviors
☆219Updated 9 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆86Updated 9 months ago
UCSB-NLP-Chang / ThinkPrune
☆46Updated 3 months ago
RM-R1-UIUC / RM-R1
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆156Updated 6 months ago
ChnQ / MI-Peaks
☆60Updated 5 months ago
THUDM / TreeRL
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
☆86Updated 6 months ago
dongxiangjue / Awesome-LLM-Self-Improvement
A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …
☆98Updated last year
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆73Updated 5 months ago
ShadeCloak / ADORA
☆46Updated 9 months ago
sail-sg / feedback-conditional-policy
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
☆55Updated this week
BytedTsinghua-SIA / Enigmata
Resources for the Enigmata Project.
☆74Updated 4 months ago
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆65Updated 11 months ago
lukahhcm / Awesome_Environment_Scaling
Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …
☆53Updated 2 weeks ago
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆127Updated 8 months ago