sail-sg / AnytimeReasonerLinks
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆51Updated 6 months ago
Alternatives and similar repositories for AnytimeReasoner
Users that are interested in AnytimeReasoner are comparing it to the libraries listed below
Sorting:
- Code for "Reasoning to Learn from Latent Thoughts"☆124Updated 9 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆55Updated last week
- ☆19Updated 9 months ago
- A Sober Look at Language Model Reasoning☆92Updated last month
- ☆70Updated 7 months ago
- Discriminative Constrained Optimization for Reinforcing Large Reasoning Models☆49Updated 2 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆152Updated 6 months ago
- A repo for open research on building large reasoning models☆127Updated last week
- Reinforcing General Reasoning without Verifiers☆93Updated 6 months ago
- ☆46Updated 3 months ago
- Official Repository of LatentSeek☆75Updated 7 months ago
- ☆64Updated 2 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆95Updated 9 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆70Updated 6 months ago
- ☆50Updated 11 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆46Updated 9 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆55Updated 2 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆72Updated 8 months ago
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated 4 months ago
- ☆126Updated this week
- ☆348Updated 5 months ago
- Extending context length of visual language models☆12Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆119Updated 8 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Updated 5 months ago
- ☆21Updated 8 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆109Updated 3 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆38Updated last year
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆36Updated 9 months ago
- Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping☆62Updated 7 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆51Updated 7 months ago