ydyhello / TailorKVLinks

Official implementation of "TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization" (Findings of ACL 2025).

☆15

Alternatives and similar repositories for TailorKV

Users that are interested in TailorKV are comparing it to the libraries listed below

Sorting:

allen4747 / Ferret
This is the official implementation for the paper: Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
☆14Updated 10 months ago
AkideLiu / MiniCache
☆10Updated 10 months ago
OpenSparseLLMs / Linearization
☆51Updated last week
aeroplanepaper / GRPO-LEAD
☆21Updated 2 months ago
Lucky-Lance / SPP
[ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
☆21Updated last year
zyxxmu / cam
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆42Updated last year
thu-nics / R2R
The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"
☆39Updated this week
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆118Updated last week
hemingkx / SWIFT
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
☆52Updated 4 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆99Updated last week
osehmathias / lisa
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
☆31Updated last year
NuoJohnChen / JudgeLRM
☆32Updated 3 months ago
GATECH-EIC / ACT
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆40Updated last year
scitix / MEAP
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
☆31Updated 2 months ago
hasanar1f / HiRED
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆39Updated 3 months ago
UNITES-Lab / MC-SMoE
[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆86Updated last month
Infini-AI-Lab / Multiverse
☆75Updated last week
WillDreamer / Awesome-MLLM-Reasoning
Recent Advances on MLLM's Reasoning Ability
☆24Updated 3 months ago
uservan / ThinkPO
☆18Updated 4 months ago
OpenSparseLLMs / Open-Pandora
Open-Pandora: On-the-fly Control Video Generation
☆34Updated 7 months ago
machilusZ / FastGen
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
☆37Updated 11 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆77Updated 5 months ago
Lucky-Lance / Expert_Sparsity
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆94Updated last year
MohammadrezaBanaei / LoRA-XS
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
☆35Updated 4 months ago
zyxxmu / DSnoT
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…
☆49Updated last year
Susan571 / LENSLLM
This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉
☆25Updated last month
horseee / dKV-Cache
☆88Updated last month
Theia-4869 / VisPruner
[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
☆11Updated 2 weeks ago
JerryYin777 / Cross-Layer-Attention
Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)
☆17Updated last year
Infini-AI-Lab / Kinetics
Kinetics: Rethinking Test-Time Scaling Laws
☆65Updated last week