EIT-NLP / Distilling-CoT-ReasoningLinks

[ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".

☆19

Alternatives and similar repositories for Distilling-CoT-Reasoning

Users that are interested in Distilling-CoT-Reasoning are comparing it to the libraries listed below

Sorting:

rookie-joe / AutoPSV
☆50Updated last year
jinzhuoran / RAG-RewardBench
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆16Updated 11 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 4 months ago
hkust-nlp / RL-Verifier-Robustness
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
☆23Updated last month
hkust-nlp / Laser
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆59Updated 5 months ago
WangHanLinHenry / SPA-RL-Agent
Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"
☆49Updated 2 months ago
lichengliu03 / unary-feedback
☆38Updated 3 months ago
MozerWang / AMPO
[arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents
☆46Updated 4 months ago
yyDing1 / ScaleQuest
[ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…
☆68Updated last year
mathllm / Step-Controlled_DPO
☆23Updated last year
rhyang2021 / ARIA
Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".
☆24Updated 3 months ago
UCSB-NLP-Chang / ThinkPrune
☆45Updated last month
yayayacc / MUR
☆45Updated last month
Zayne-sprague / To-CoT-or-not-to-CoT
☆25Updated 7 months ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆62Updated 11 months ago
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆84Updated 9 months ago
hkust-nlp / GUIMid
☆21Updated 6 months ago
weizhepei / WebAgent-R1
[EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
☆59Updated 2 weeks ago
bobxwu / learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…
☆58Updated 5 months ago
byronBBL / Context-DPO
Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"
☆18Updated 9 months ago
test-time-interaction / TTI
☆64Updated 5 months ago
YiCheng98 / IntegrativeDecoding
Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"
☆32Updated 7 months ago
RUCKBReasoning / CoT-based-Synthesizer
Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'
☆31Updated 6 months ago
zhaochen0110 / Cotempqa
Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)
☆32Updated last year
LightChen233 / reasoning-boundary
☆69Updated 5 months ago
starrYYxuan / LeCo
This the implementation of LeCo
☆31Updated 10 months ago
RUCAIBox / R1-Searcher-plus
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
☆65Updated 5 months ago
ZhentingWang / DUMP
☆32Updated 6 months ago
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆24Updated 10 months ago
xuyige / SoftCoT
ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…
☆61Updated 5 months ago