[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
☆56May 2, 2025Updated last year
Alternatives and similar repositories for RSD
Users that are interested in RSD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆20May 14, 2025Updated last year
- Make reasoning models scalable☆49Jun 2, 2026Updated 2 weeks ago
- ☆34Oct 13, 2025Updated 8 months ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆30Jun 2, 2026Updated 2 weeks ago
- [ACL 2025] "CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought"☆17Apr 3, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]☆71Oct 2, 2025Updated 8 months ago
- ☆15Apr 26, 2025Updated last year
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.☆21Jul 18, 2025Updated 11 months ago
- ☆20Mar 18, 2026Updated 3 months ago
- An adaptive sampling framework for Reinforce-style LLM post training.☆96Nov 29, 2025Updated 6 months ago
- Continuous Pipelined Speculative Decoding☆20May 25, 2026Updated 3 weeks ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆24May 6, 2026Updated last month
- VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models☆26Mar 26, 2025Updated last year
- [ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆83Jul 14, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Agent-RRM: Exploring Reasoning Reward Model for Agents☆69Mar 17, 2026Updated 3 months ago
- Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks☆11Oct 21, 2022Updated 3 years ago
- Mirror for Java and PHP libraries and text resources to facilitate the use of Inuktitut in its written form on computers and the web☆10Aug 2, 2015Updated 10 years ago
- PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]☆58Jun 12, 2026Updated last week
- JMLR Cover Letter Template☆10Dec 15, 2021Updated 4 years ago
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆281Aug 31, 2024Updated last year
- SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs (ICML 2025)☆36Nov 28, 2025Updated 6 months ago
- Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"☆23Mar 25, 2026Updated 2 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆111Nov 22, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆69Feb 21, 2025Updated last year
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆46Apr 21, 2024Updated 2 years ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆43Oct 31, 2025Updated 7 months ago
- ojjson is a library designed to facilitate JSON interactions with Ollama, a large language api (LLM). It leverages the power of Zod for s…☆12Nov 7, 2024Updated last year
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding☆23Mar 2, 2025Updated last year
- FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]☆51Apr 29, 2026Updated last month
- CFG-GAN: Composite functional gradient learning of generative adversarial models☆15Jul 9, 2020Updated 5 years ago
- [CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆87Feb 27, 2026Updated 3 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆269May 5, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Extension of libSVM to support Open Set Recognitoin as described in "Toward Open Set Recognition", TPAMI July 2013☆12Oct 21, 2013Updated 12 years ago
- (ACL 2025 Main) Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillat…☆35Aug 23, 2025Updated 9 months ago
- KV cache compression via sparse coding☆18Oct 26, 2025Updated 7 months ago
- ☆11Sep 25, 2025Updated 8 months ago
- Browser extension to quickly access Perplexity searchbar from any page with a shortcut☆13Oct 20, 2024Updated last year
- SMART introduces a novel test-time framework where Small Language Models (SLMs) reason step-by-step, and Large Language Models (LLMs) pro…☆12Jul 9, 2025Updated 11 months ago
- SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models☆17Jun 24, 2024Updated last year