THU-KEG / AdaptThinkLinks
☆112Updated 3 weeks ago
Alternatives and similar repositories for AdaptThink
Users that are interested in AdaptThink are comparing it to the libraries listed below
Sorting:
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆54Updated 6 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆108Updated 2 weeks ago
- ☆61Updated this week
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆155Updated 3 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆105Updated 4 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆82Updated 4 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆112Updated 2 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆72Updated 2 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆60Updated last month
- The code and data of DPA-RAG, accepted by WWW 2025 main conference.☆61Updated 5 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆220Updated last month
- Model merging is a highly efficient approach for long-to-short reasoning.☆62Updated 2 weeks ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆178Updated last week
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆122Updated 7 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆78Updated 5 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆28Updated 3 weeks ago
- A version of verl to support tool use☆251Updated this week
- Large Language Models Can Self-Improve in Long-context Reasoning☆70Updated 6 months ago
- ☆116Updated last month
- Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆147Updated 2 weeks ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆45Updated 7 months ago
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆63Updated 7 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆152Updated 3 months ago
- ☆107Updated 3 months ago
- The demo, code and data of FollowRAG☆72Updated last month
- ☆65Updated 2 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆157Updated 2 weeks ago
- [arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents☆32Updated last month
- Fantastic Data Engineering for Large Language Models☆89Updated 5 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆72Updated 4 months ago