abdelfattah-lab/TokenButler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/abdelfattah-lab/TokenButler)

abdelfattah-lab / TokenButler

☆27

Alternatives and similar repositories for TokenButler

Users that are interested in TokenButler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

abdelfattah-lab / xKV
View on GitHub
xKV: Cross-Layer SVD for KV-Cache Compression [ICML 2026]
☆53Jul 7, 2026Updated 2 weeks ago
chenyaofo / CCA-Attention
View on GitHub
☆20Aug 14, 2025Updated 11 months ago
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆400Jul 10, 2025Updated last year
abdelfattah-lab / smcsd
View on GitHub
Sequential Monte Carlo Speculative Decoding
☆52Updated this week
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆54Oct 18, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
enyac-group / Elana
View on GitHub
Elana: A Simple Energy & Latency Analyzer for LLMs
☆16Apr 3, 2026Updated 3 months ago
enyac-group / UniQL
View on GitHub
UniQL official repository (ICLR 2026)
☆16Jan 27, 2026Updated 5 months ago
JungHoyoun / PromptCompressor
View on GitHub
☆12Apr 29, 2024Updated 2 years ago
princeton-pli / PruLong
View on GitHub
Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"
☆48Jul 29, 2025Updated 11 months ago
Tianshi-Xu / PrivCirNet
View on GitHub
[NeurIPS'24] Official implement of "PrivCirNet: Efficient Private Inference via Block Circulant Transformation"
☆14Feb 26, 2026Updated 4 months ago
huangyuxiang03 / Locret
View on GitHub
☆14Oct 3, 2024Updated last year
zijian678 / TDD
View on GitHub
☆14Apr 22, 2024Updated 2 years ago
shiweijiezero / R3L
View on GitHub
☆23Apr 5, 2026Updated 3 months ago
Jingyu6 / speculative_prefill
View on GitHub
☆63May 19, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
snu-mllab / Context-Memory
View on GitHub
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆64Apr 18, 2024Updated 2 years ago
kaistAI / knowledge-reasoning
View on GitHub
[EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…
☆23Dec 4, 2024Updated last year
TAU-MLwell / Set-Tree
View on GitHub
Official repository for the paper: "Trees with Attention for Set Prediction Tasks" (ICML21)
☆10Jan 19, 2022Updated 4 years ago
wutaiqiang / LLM_KD_AKL
View on GitHub
☆22Oct 22, 2024Updated last year
PKU-SEC-Lab / mpcvit
View on GitHub
Code release for MPCViT accepted by ICCV 2023
☆16Jan 6, 2025Updated last year
DerrickYLJ / LessIsMore
View on GitHub
[ICML 2026] Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
☆34Sep 12, 2025Updated 10 months ago
zhangxy-2019 / RetroAgent
View on GitHub
RETROAGENT: From Solving to Evolving via Retrospective Dual Intrinsic Feedback
☆26Mar 30, 2026Updated 3 months ago
GATECH-EIC / ShiftAddViT
View on GitHub
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
☆30Dec 6, 2023Updated 2 years ago
diningphil / continual_learning_for_graphs
View on GitHub
☆13Feb 16, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
LLMkvsys / rethink-kv-compression
View on GitHub
☆24Mar 7, 2025Updated last year
wejoncy / sfllm
View on GitHub
Super fast serving stack for LLM on Windows/Linux/Macos
☆17Dec 17, 2025Updated 7 months ago
GSYfate / knnlm-limits
View on GitHub
Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"
☆24Apr 30, 2025Updated last year
bigai-nlco / CREAM
View on GitHub
[NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding
☆22Oct 10, 2024Updated last year
sjtu-zhao-lab / ClusterKV
View on GitHub
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)
☆32Feb 26, 2026Updated 4 months ago
abdelfattah-lab / attamba
View on GitHub
☆13Nov 29, 2024Updated last year
PiotrNawrot / sparse-frontier
View on GitHub
The evaluation framework for training-free sparse attention in LLMs
☆127Jan 27, 2026Updated 5 months ago
namespace-Pt / UltraGist
View on GitHub
☆18Dec 2, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
UMass-Embodied-AGI / BudgetGuidance
View on GitHub
[ACL'26 Findings] Steering LLM Thinking with Budget Guidance
☆32Feb 19, 2026Updated 5 months ago
whyNLP / LCKV
View on GitHub
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆157Apr 7, 2025Updated last year
UmeanNever / RankSurprisalRatio
View on GitHub
[ACL 2026 Main] Official Repo for Paper "Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Ali…
☆17Jul 1, 2026Updated 2 weeks ago
Dao-AILab / grouped-latent-attention
View on GitHub
☆135May 29, 2025Updated last year
chtmp223 / suri
View on GitHub
Suri: Multi-constraint instruction following for long-form text generation [EMNLP’24]
☆27Oct 3, 2025Updated 9 months ago
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆418Nov 20, 2025Updated 8 months ago
opengear-project / GEAR
View on GitHub
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆183Jul 12, 2024Updated 2 years ago