EIT-NLP / Distilling-CoT-ReasoningLinks
[ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".
☆17Updated 4 months ago
Alternatives and similar repositories for Distilling-CoT-Reasoning
Users that are interested in Distilling-CoT-Reasoning are comparing it to the libraries listed below
Sorting:
- ☆48Updated last month
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆63Updated 8 months ago
- ☆15Updated 4 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆18Updated last month
- official implementation of paper "Process Reward Model with Q-value Rankings"☆60Updated 5 months ago
- Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"☆15Updated 5 months ago
- ☆46Updated 8 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 5 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆103Updated 2 months ago
- Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"☆28Updated 3 months ago
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆23Updated 3 months ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆27Updated last month
- ☆36Updated 3 months ago
- [arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents☆36Updated 2 weeks ago
- ☆47Updated 5 months ago
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆36Updated last week
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 6 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆67Updated 2 months ago
- [ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models☆21Updated last year
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆26Updated 4 months ago
- ☆22Updated last year
- [ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling☆15Updated 7 months ago
- Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)☆14Updated 7 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆27Updated last month
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆24Updated 7 months ago
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆26Updated 4 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆51Updated last month
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated last month
- ☆19Updated 4 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆49Updated 8 months ago