Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆52Jul 15, 2025Updated 7 months ago
Alternatives and similar repositories for AnytimeReasoner
Users that are interested in AnytimeReasoner are comparing it to the libraries listed below
Sorting:
- ☆20Apr 16, 2025Updated 10 months ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆36Apr 14, 2025Updated 10 months ago
- The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'☆24May 20, 2025Updated 9 months ago
- Reinforcing General Reasoning without Verifiers☆96Jun 24, 2025Updated 8 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆47Apr 15, 2025Updated 10 months ago
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Feb 9, 2026Updated 3 weeks ago
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆22Apr 22, 2025Updated 10 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated 2 months ago
- ☆34May 9, 2025Updated 9 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆26Aug 9, 2025Updated 6 months ago
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆20Feb 26, 2025Updated last year
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆74Jul 14, 2025Updated 7 months ago
- A Gym for Agentic LLMs☆452Jan 21, 2026Updated last month
- [ArXiv 2025] Denial-of-Service Poisoning Attacks on Large Language Models☆23Oct 22, 2024Updated last year
- [ICLR 2025] A Closer Look at Machine Unlearning for Large Language Models☆45Dec 4, 2024Updated last year
- AnchorAttention: Improved attention for LLMs long-context training☆214Jan 15, 2025Updated last year
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆21Apr 2, 2024Updated last year
- ☆25Aug 19, 2025Updated 6 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆65Jan 11, 2025Updated last year
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆59Jan 5, 2026Updated 2 months ago
- ☆14Mar 5, 2024Updated 2 years ago
- Exploration of automated dataset selection approaches at large scales.☆52Mar 4, 2025Updated last year
- ☆64Jan 12, 2026Updated last month
- ☆23Sep 19, 2024Updated last year
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆633Jan 29, 2026Updated last month
- [TMLR 2025] On Memorization in Diffusion Models☆31Oct 5, 2023Updated 2 years ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆27Aug 20, 2025Updated 6 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆128Jul 24, 2025Updated 7 months ago
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆14Jun 6, 2025Updated 8 months ago
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆11Jan 10, 2025Updated last year
- [AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates☆22Jul 1, 2025Updated 8 months ago
- Aligning Agentic World Models via Knowledgeable Experience Learning☆31Jan 25, 2026Updated last month
- ☆34Jan 25, 2026Updated last month
- a benchmark to evaluate the situated inductive reasoning☆15Jan 7, 2025Updated last year
- The official github repo for "Training Optimal Large Diffusion Language Models", the first-ever large-scale diffusion language models sca…☆45Nov 6, 2025Updated 3 months ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding☆34Jan 16, 2026Updated last month
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆14Mar 17, 2025Updated 11 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]☆38Feb 1, 2026Updated last month