Infini-AI-Lab/Sequoia

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Infini-AI-Lab/Sequoia)

Infini-AI-Lab / Sequoia

scalable and robust tree-based speculative decoding algorithm

☆376

Alternatives and similar repositories for Sequoia

Users that are interested in Sequoia are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Infini-AI-Lab / TriForce
View on GitHub
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
☆281Aug 31, 2024Updated last year
Infini-AI-Lab / MagicDec
View on GitHub
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆154Dec 4, 2024Updated last year
SafeAILab / EAGLE
View on GitHub
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
☆2,471Feb 20, 2026Updated 5 months ago
thunlp / Ouroboros
View on GitHub
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆118Mar 20, 2025Updated last year
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,276Jun 27, 2026Updated 3 weeks ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
linfeng93 / BiTA
View on GitHub
An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.
☆28Apr 15, 2025Updated last year
hao-ai-lab / LookaheadDecoding
View on GitHub
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,340Mar 6, 2025Updated last year
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,757Jun 25, 2024Updated 2 years ago
flexflow / flexflow-train
View on GitHub
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,896Jul 1, 2026Updated 2 weeks ago
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆401Apr 22, 2025Updated last year
dilab-zju / self-speculative-decoding
View on GitHub
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆230Feb 13, 2025Updated last year
mit-han-lab / omniserve
View on GitHub
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆850Mar 6, 2025Updated last year
Infini-AI-Lab / UMbreLLa
View on GitHub
LLM Inference on consumer devices
☆131Mar 17, 2025Updated last year
FMInference / DejaVu
View on GitHub
☆359Apr 2, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
feifeibear / LLMSpeculativeSampling
View on GitHub
Fast inference from large lauguage models via speculative decoding
☆920Aug 22, 2024Updated last year
flexflow / flexflow-serve
View on GitHub
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
☆85Sep 15, 2025Updated 10 months ago
FasterDecoding / REST
View on GitHub
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆220Mar 5, 2026Updated 4 months ago
sail-sg / LongSpec
View on GitHub
[ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆84Jul 14, 2025Updated last year
NJUNLP / MCSD
View on GitHub
Multi-Candidate Speculative Decoding
☆41Apr 22, 2024Updated 2 years ago
yandex-research / specexec
View on GitHub
☆68Nov 4, 2024Updated last year
hao-ai-lab / Consistency_LLM
View on GitHub
[ICML 2024] CLLMs: Consistency Large Language Models
☆416Nov 16, 2024Updated last year
hemingkx / SWIFT
View on GitHub
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
☆70Feb 21, 2025Updated last year
kssteven418 / BigLittleDecoder
View on GitHub
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆99Feb 6, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / MInference
View on GitHub
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…
☆1,221Apr 8, 2026Updated 3 months ago
FMInference / H2O
View on GitHub
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆528Aug 1, 2024Updated last year
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,988Updated this week
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆56Aug 6, 2025Updated 11 months ago
myshell-ai / JetMoE
View on GitHub
Reaching LLaMA2 Performance with 0.1M Dollars
☆986Jul 23, 2024Updated last year
Infini-AI-Lab / Sirius
View on GitHub
Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…
☆21Sep 10, 2024Updated last year
punica-ai / punica
View on GitHub
Serving multiple LoRA finetuned LLM as one
☆1,166May 8, 2024Updated 2 years ago
lfsszd / CS-Drafting
View on GitHub
Cascade Speculative Drafting
☆33Apr 2, 2024Updated 2 years ago
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆400Jul 10, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ColfaxResearch / cutlass-kernels
View on GitHub
☆269Jul 11, 2024Updated 2 years ago
LiuXiaoxuanPKU / OSD
View on GitHub
☆68Dec 3, 2024Updated last year
S-LoRA / S-LoRA
View on GitHub
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,920Jan 21, 2024Updated 2 years ago
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,376Updated this week
efeslab / Nanoflow
View on GitHub
A throughput-oriented high-performance serving framework for LLMs
☆968Mar 29, 2026Updated 3 months ago
d-matrix-ai / keyformer-llm
View on GitHub
Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning
☆57Mar 26, 2024Updated 2 years ago
Infini-AI-Lab / Sparrow
View on GitHub
☆16Jun 15, 2026Updated last month