lime-RL / DCPOLinks
DCPO: Dynamic Adaptive Clipping for RL
☆39Updated last month
Alternatives and similar repositories for DCPO
Users that are interested in DCPO are comparing it to the libraries listed below
Sorting:
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆115Updated 6 months ago
- ☆36Updated 3 weeks ago
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆19Updated 8 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆28Updated 4 months ago
- [NeurIPS'25 Spotlight] ARM: Adaptive Reasoning Model☆56Updated 3 weeks ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆66Updated 5 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆24Updated 2 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆77Updated last month
- The demo, code and data of FollowRAG☆75Updated 4 months ago
- ☆23Updated 10 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆30Updated 2 months ago
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated last month
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆27Updated 3 weeks ago
- ☆46Updated 8 months ago
- ☆45Updated 3 weeks ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆42Updated 8 months ago
- ☆41Updated 2 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆170Updated 3 months ago
- ☆43Updated 3 weeks ago
- A Unified Framework for High-Performance and Extensible LLM Steering☆89Updated last week
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning☆53Updated 3 weeks ago
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…