LINs-lab/DeFT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LINs-lab/DeFT)

LINs-lab / DeFT

[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

☆54

Alternatives and similar repositories for DeFT

Users that are interested in DeFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

PanZaifeng / FastTree-Artifact
View on GitHub
☆32Mar 24, 2025Updated last year
Infini-AI-Lab / vortex_torch
View on GitHub
Vortex: Programmable Sparse Attention for Agents as Algorithm Designers
☆68Jun 24, 2026Updated 3 weeks ago
Jingyu6 / speculative_prefill
View on GitHub
☆63May 19, 2025Updated last year
HarryWu99 / llm_kvcache_sparsity
View on GitHub
Implement some method of LLM KV Cache Sparsity
☆41Jun 6, 2024Updated 2 years ago
sspec-project / SparseSpec
View on GitHub
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
☆115Dec 2, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hyx1999 / SAM-Decoding
View on GitHub
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
☆52May 12, 2026Updated 2 months ago
cmd2001 / jLock
View on GitHub
Python Script to Open SJTU Dormitory Smart Lock
☆10Sep 12, 2022Updated 3 years ago
QSCTech / qsc-mobile-ios-v3
View on GitHub
Legacy Code of ZJU Campus App for iOS
☆11Jan 31, 2024Updated 2 years ago
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆57Aug 6, 2025Updated 11 months ago
ACMClassOJ / Open-TesutoHime
View on GitHub
☆12Sep 4, 2021Updated 4 years ago
NonvolatileMemory / flash_tree_attn
View on GitHub
☆20Dec 24, 2024Updated last year
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆210Mar 18, 2026Updated 4 months ago
deathwings602 / Unified-IR
View on GitHub
面向多平台编译优化的深度学习中间表示
☆10Oct 28, 2024Updated last year
PanZaifeng / RecFlex
View on GitHub
A recommendation model kernel optimizing system
☆12Jun 5, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
ulab-uiuc / diagram-eval
View on GitHub
[EMNLP 2025] DiagramEval: Evaluating LLM-Generated Diagrams via Graphs
☆17Nov 1, 2025Updated 8 months ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
LLMServe / DistServe
View on GitHub
Disaggregated serving system for Large Language Models (LLMs).
☆826Apr 6, 2025Updated last year
lwaekfjlk / Personalized-Text-Generation-Papers
View on GitHub
Collect papers related to personalized text generation
☆18Sep 6, 2021Updated 4 years ago
metacarbon / shareAtt
View on GitHub
Beyond KV Caching: Shared Attention for Efficient LLMs
☆20Jul 19, 2024Updated 2 years ago
Infini-AI-Lab / TriForce
View on GitHub
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
☆281Aug 31, 2024Updated last year
mit-han-lab / fastrl
View on GitHub
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆174Feb 27, 2026Updated 4 months ago
Infini-AI-Lab / MagicPIG
View on GitHub
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
☆255Dec 16, 2024Updated last year
sail-sg / LongSpec
View on GitHub
[ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆84Jul 14, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
PlusLabNLP / AESOP
View on GitHub
Code for Aesop: Paraphrase Generation with Adaptive Syntactic Control (EMNLP 2021)
☆26Jan 17, 2022Updated 4 years ago
Multi-LLM / prism-research
View on GitHub
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆71Mar 17, 2026Updated 4 months ago
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
RLsys-Foundation / TritonForge
View on GitHub
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆146Nov 10, 2025Updated 8 months ago
PanZaifeng / G-SLIDE
View on GitHub
☆15Jan 7, 2022Updated 4 years ago
Scientific-Computing-Lab / STREAMer
View on GitHub
STREAMer: Benchmarking remote volatile and non-volatile memory bandwidth
☆18Aug 21, 2023Updated 2 years ago
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆512Jan 8, 2026Updated 6 months ago
jianuo-huang / Domino
View on GitHub
Official implementation of “Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding”.
☆121Updated this week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆401Apr 22, 2025Updated last year
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
gudiandian / ElasticFlow
View on GitHub
☆17May 10, 2024Updated 2 years ago
tchajed / sys-verif-fa24
View on GitHub
Course website for Systems Verification Fall 2024
☆14Jul 10, 2025Updated last year
osayamenja / FlashMoE
View on GitHub
Distributed MoE in a Single Kernel [NeurIPS '25]
☆272May 5, 2026Updated 2 months ago
thunlp / FR-Spec
View on GitHub
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
☆55Jul 15, 2025Updated last year
cornell-brg / pyhdl-eval
View on GitHub
LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs
☆18Aug 26, 2024Updated last year