ruipeterpan/specreason

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ruipeterpan/specreason)

ruipeterpan / specreason

PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]

☆74

Alternatives and similar repositories for specreason

Users that are interested in specreason are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

uservan / speculative_thinking
View on GitHub
☆34Oct 13, 2025Updated 9 months ago
Jikai0Wang / Speculative_CoT
View on GitHub
☆20May 14, 2025Updated last year
ruipeterpan / failfast
View on GitHub
Artifact for "Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs" [arXiv '25]
☆20May 4, 2026Updated 2 months ago
DerrickYLJ / LessIsMore
View on GitHub
[ICML 2026] Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
☆34Sep 12, 2025Updated 10 months ago
BaohaoLiao / RSD
View on GitHub
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
☆56May 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ruipeterpan / marconi
View on GitHub
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
☆63Mar 5, 2025Updated last year
guanyilin428 / Dynamic-Speculative-Planning
View on GitHub
☆48Sep 13, 2025Updated 10 months ago
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
jiwonsong-dev / ReasoningPathCompression
View on GitHub
[NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"
☆32Oct 20, 2025Updated 9 months ago
garipovroma / autojudge
View on GitHub
[NeurIPS 2025] Official PyTorch implementation for the paper AutoJudge: Judge Decoding Without Manual Annotation
☆21Dec 22, 2025Updated 7 months ago
sspec-project / SparseSpec
View on GitHub
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
☆115Dec 2, 2025Updated 7 months ago
VITA-Group / SEAL
View on GitHub
[COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆60Apr 6, 2025Updated last year
abdelfattah-lab / SplitReason
View on GitHub
☆20Mar 18, 2026Updated 4 months ago
enyac-group / evol-q
View on GitHub
Quantization in the Jagged Loss Landscape of Vision Transformers
☆13Oct 22, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ArminAzizi98 / LaMDA
View on GitHub
☆15Nov 7, 2024Updated last year
cornell-zhang / llm-datatypes
View on GitHub
Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
☆27Jun 25, 2024Updated 2 years ago
smart-lty / ParallelSpeculativeDecoding
View on GitHub
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆170Dec 23, 2025Updated 7 months ago
flexflow / flexflow-serve
View on GitHub
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
☆85Sep 15, 2025Updated 10 months ago
sail-sg / LongSpec
View on GitHub
[ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆84Jul 14, 2025Updated last year
hao-ai-lab / Dynasor
View on GitHub
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆232May 31, 2025Updated last year
togethercomputer / saw-int4
View on GitHub
Official implementation of Paper "System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving"
☆30Apr 17, 2026Updated 3 months ago
chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 7 months ago
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆57Aug 6, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mit-han-lab / fastrl
View on GitHub
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆174Feb 27, 2026Updated 4 months ago
hao-ai-lab / LookaheadReasoning
View on GitHub
[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning
☆69Oct 31, 2025Updated 8 months ago
czg1225 / VeriThinker
View on GitHub
[NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient
☆67Sep 27, 2025Updated 9 months ago
UCSB-AI / Soft-Thinking
View on GitHub
Official implementation of the NeurIPS 2025 paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"
☆345Jun 12, 2026Updated last month
NVlabs / DLER
View on GitHub
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
☆17Nov 11, 2025Updated 8 months ago
Infini-AI-Lab / TriForce
View on GitHub
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
☆281Aug 31, 2024Updated last year
hao-ai-lab / DistCA
View on GitHub
Efficient Long-context Language Model Training by Core Attention Disaggregation
☆106Apr 7, 2026Updated 3 months ago
LinB203 / FSDP-Training
View on GitHub
Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA
☆32Nov 27, 2025Updated 7 months ago
Ying1123 / llm-caching-multiplexing
View on GitHub
☆19Jun 3, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
jiazhihao / attention_superoptimizer
View on GitHub
An Attention Superoptimizer
☆22Jan 20, 2025Updated last year
zhengkid / Parallel-Probe
View on GitHub
The offical repo for "Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing"
☆19Feb 3, 2026Updated 5 months ago
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆421Nov 20, 2025Updated 8 months ago
Trustworthy-ML-Lab / ThinkEdit
View on GitHub
[EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…
☆19Dec 17, 2025Updated 7 months ago
hao-ai-lab / flash-attention-fp4
View on GitHub
NVFP4 Flash-Attention 4 on BlackWell
☆30Updated this week
DiT-Serving / TetriServe
View on GitHub
[ASPLOS' 26] TetriServe: Efficiently Serving Mixed DiT Workloads
☆17Mar 12, 2026Updated 4 months ago
Infini-AI-Lab / Kinetics
View on GitHub
Kinetics: Rethinking Test-Time Scaling Laws
☆87Jul 11, 2025Updated last year