UMass-Embodied-AGI / BudgetGuidanceLinks
Steering LLM Thinking with Budget Guidance
☆24Updated last month
Alternatives and similar repositories for BudgetGuidance
Users that are interested in BudgetGuidance are comparing it to the libraries listed below
Sorting:
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆77Updated 6 months ago
- ☆27Updated 3 months ago
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆43Updated 2 months ago
- ☆35Updated 4 months ago
- SSRL: Self-Search Reinforcement Learning☆144Updated last month
- Code for the paper: "Learning to Reason without External Rewards"☆355Updated 2 months ago
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆166Updated 2 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆47Updated 4 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆124Updated last month
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆147Updated last week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 8 months ago
- A repo for open research on building large reasoning models☆103Updated last week
- ☆93Updated 3 months ago
- ☆215Updated 7 months ago
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆31Updated this week
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆245Updated 4 months ago
- ☆51Updated 3 months ago
- ☆16Updated 3 months ago
- [NeurIP'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆108Updated last week
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆37Updated last week
- The offical repo for "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning"☆162Updated last week
- Verlog: A Multi-turn RL framework for LLM agents☆41Updated 2 weeks ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆99Updated 3 weeks ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆219Updated last week
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆105Updated 3 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆58Updated 6 months ago
- ☆85Updated 2 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆109Updated 5 months ago
- ☆19Updated 6 months ago
- [NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆110Updated 3 months ago