PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆65Oct 2, 2025Updated 5 months ago
Alternatives and similar repositories for specreason
Users that are interested in specreason are comparing it to the libraries listed below
Sorting:
- ☆32Oct 13, 2025Updated 4 months ago
- ☆20May 14, 2025Updated 9 months ago
- ☆34Jun 22, 2024Updated last year
- ☆15Nov 7, 2024Updated last year
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…☆17Dec 17, 2025Updated 2 months ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆53Aug 6, 2025Updated 6 months ago
- ☆37Sep 13, 2025Updated 5 months ago
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- APEX+ is an LLM Serving Simulator☆42Jun 16, 2025Updated 8 months ago
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆74Jul 14, 2025Updated 7 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Sep 24, 2024Updated last year
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆147Dec 23, 2025Updated 2 months ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- ☆14Apr 14, 2025Updated 10 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆56May 2, 2025Updated 10 months ago
- Official implementation for Text Generation Beyond Discrete Token Sampling☆21Aug 11, 2025Updated 6 months ago
- EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆27Jul 30, 2025Updated 7 months ago
- a simple API to use CUPTI☆11Aug 19, 2025Updated 6 months ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Jan 6, 2026Updated last month
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆27Jun 25, 2024Updated last year
- Official implementation for "Mixture of In-Context Experts Enhance LLMs’ Awareness of Long Contexts" (Accepted by Neurips2024)☆13Jan 7, 2025Updated last year
- Inverse Scaling in Test-Time Compute☆25Dec 3, 2025Updated 3 months ago
- ☆21Dec 25, 2025Updated 2 months ago
- Code and data release of the paper Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows☆14Oct 4, 2024Updated last year
- ☆13Jul 10, 2024Updated last year
- [COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…☆15Oct 31, 2025Updated 4 months ago
- An official repository for GPTailor☆17Jun 29, 2025Updated 8 months ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆15Feb 4, 2025Updated last year
- ☆55Jul 7, 2025Updated 7 months ago
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆30Jun 14, 2024Updated last year
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆142Dec 17, 2025Updated 2 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…☆80May 30, 2025Updated 9 months ago
- ☆13Jan 15, 2025Updated last year
- ☆12Oct 16, 2022Updated 3 years ago
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆24Sep 23, 2025Updated 5 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- [AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data☆33Apr 7, 2025Updated 10 months ago
- [Findings of EMNLP 2024] AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models☆20Oct 2, 2024Updated last year
- Quantization in the Jagged Loss Landscape of Vision Transformers☆13Oct 22, 2023Updated 2 years ago