hao-ai-lab/LookaheadReasoning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hao-ai-lab/LookaheadReasoning)

hao-ai-lab / LookaheadReasoning

[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning

☆69

Alternatives and similar repositories for LookaheadReasoning

Users that are interested in LookaheadReasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hao-ai-lab / DistCA
View on GitHub
Efficient Long-context Language Model Training by Core Attention Disaggregation
☆106Apr 7, 2026Updated 3 months ago
hao-ai-lab / JetSpec
View on GitHub
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Causal Parallel Tree Drafting
☆168Jun 27, 2026Updated 3 weeks ago
lmgame-org / GRL
View on GitHub
Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning
☆65Dec 18, 2025Updated 7 months ago
hao-ai-lab / Awesome-Video-Attention
View on GitHub
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…
☆61Oct 27, 2025Updated 8 months ago
hao-ai-lab / d3LLM
View on GitHub
[ICML 2026] d3LLM: Ultra-Fast Diffusion LLM 🚀
☆148May 1, 2026Updated 2 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
hao-ai-lab / Dynasor
View on GitHub
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆232May 31, 2025Updated last year
petuum-inc / poseidon-release
View on GitHub
Release doc/tutorial/wheels for poseidon-tf
☆10Jan 18, 2018Updated 8 years ago
uservan / speculative_thinking
View on GitHub
☆34Oct 13, 2025Updated 9 months ago
ray-project / pygloo
View on GitHub
Pygloo provides Python bindings for Gloo.
☆22Jul 7, 2025Updated last year
RLsys-Foundation / TritonForge
View on GitHub
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆146Nov 10, 2025Updated 8 months ago
ruipeterpan / specreason
View on GitHub
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆74Oct 2, 2025Updated 9 months ago
cornserve-ai / cornserve
View on GitHub
Easy, Fast, and Scalable Multimodal AI
☆129Jun 2, 2026Updated last month
facebookresearch / deepconf
View on GitHub
DeepConf: Deep Think with Confidence
☆408Jul 17, 2026Updated last week
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ASTRAL-Group / LoRe
View on GitHub
When Reasoning Meets Its Laws
☆38Jan 2, 2026Updated 6 months ago
Jikai0Wang / Speculative_CoT
View on GitHub
☆20May 14, 2025Updated last year
0xWelt / VibeRL
View on GitHub
VibeRL is a Reinforcement Learning framework built essentially through vibe coding with Kimi K2.
☆17Updated this week
sgl-project / SpecForge
View on GitHub
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆1,010Updated this week
ruipeterpan / marconi
View on GitHub
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
☆63Mar 5, 2025Updated last year
sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆32Updated this week
JiangLiSJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
xlite-dev / qwen-image-fast
View on GitHub
⚡️Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs
☆17Oct 24, 2025Updated 9 months ago
jiwonsong-dev / ReasoningPathCompression
View on GitHub
[NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"
☆32Oct 20, 2025Updated 9 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
zhisbug / Cavs
View on GitHub
Cavs: An Efficient Runtime System for Dynamic Neural Networks
☆15Sep 18, 2020Updated 5 years ago
cocoa-org / NanoRollout
View on GitHub
Scale digital agent rollouts without pain.
☆34Jun 18, 2026Updated last month
SqueezeAILab / MultipoleAttention
View on GitHub
[NeurIPS 2025] Multipole Attention for Efficient Long Context Reasoning
☆24Dec 5, 2025Updated 7 months ago
MLSysOps / InfraGym
View on GitHub
Empowering LLM Agents for Real-World Computer System Optimization
☆17Sep 10, 2025Updated 10 months ago
hao-ai-lab / vllm-ltr
View on GitHub
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆81Nov 4, 2024Updated last year
jaywonchung / ShadowTutor
View on GitHub
(ICPP '20) ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference
☆12Jun 22, 2020Updated 6 years ago
chicosirius / think-or-not
View on GitHub
☆22May 23, 2025Updated last year
KuangjuX / cuda-evolve-oss
View on GitHub
Autonomous GPU kernel optimization system driven by AI agents.
☆31Mar 29, 2026Updated 3 months ago
yaof20 / Flash-RL
View on GitHub
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆307Nov 7, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 3 weeks ago
LINs-lab / DeFT
View on GitHub
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
☆54Jun 17, 2025Updated last year
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Sep 4, 2025Updated 10 months ago
stepfun-ai / StepMesh
View on GitHub
☆380Jan 28, 2026Updated 5 months ago
ray-project / distml
View on GitHub
Distributed ML Optimizer
☆35Jul 28, 2021Updated 4 years ago
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆211Mar 18, 2026Updated 4 months ago
hao-ai-lab / JacobiForcing
View on GitHub
[ICML 2026] Jacobi Forcing: Fast and Accurate Diffusion-style Decoding
☆124Feb 20, 2026Updated 5 months ago