WANGXinyiLinda / planning_tokensLinks

Official code for Guiding Language Model Math Reasoning with Planning Tokens

☆15

Alternatives and similar repositories for planning_tokens

Users that are interested in planning_tokens are comparing it to the libraries listed below

Sorting:

GeniusHTX / TALE
☆126Updated 2 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆81Updated 5 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆64Updated 3 weeks ago
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆81Updated last month
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆39Updated 4 months ago
hemingkx / SWIFT
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
☆52Updated 5 months ago
JayZhang42 / SLED
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
☆28Updated 8 months ago
hahahawu / Long-to-Short-via-Model-Merging
Model merging is a highly efficient approach for long-to-short reasoning.
☆77Updated 2 months ago
TianHongZXY / RLVR-Decomposed
Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆83Updated 3 weeks ago
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆61Updated last year
Dereck0602 / Awesome_Test_Time_LLMs
☆117Updated 4 months ago
StarDewXXX / O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆86Updated 5 months ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆162Updated this week
hanxuhu / SeqIns
The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…
☆29Updated 8 months ago
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆22Updated 7 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆105Updated 3 months ago
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆114Updated last year
ChnQ / MI-Peaks
☆49Updated 3 weeks ago
GAIR-NLP / weak-to-strong-reasoning
☆59Updated 11 months ago
hkust-nlp / Laser
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆52Updated 2 months ago
uservan / ThinkPO
☆18Updated last week
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆127Updated 4 months ago
sail-sg / sdft
[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".
☆125Updated 9 months ago
Zanette-Labs / SpeculativeRejection
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆49Updated 9 months ago
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆111Updated 7 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆76Updated 4 months ago
yyDing1 / ScaleQuest
[ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆63Updated 9 months ago
THU-KEG / RM-Bench
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆58Updated 3 weeks ago
UCSB-NLP-Chang / ThinkPrune
☆39Updated 3 months ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆58Updated 8 months ago