ydyhello / TailorKVLinks
Official implementation of "TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization" (Findings of ACL 2025).
☆15Updated last month
Alternatives and similar repositories for TailorKV
Users that are interested in TailorKV are comparing it to the libraries listed below
Sorting:
- This is the official implementation for the paper: Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models☆14Updated 10 months ago
- ☆10Updated 10 months ago
- ☆51Updated last week
- ☆21Updated 2 months ago
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21Updated last year
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆42Updated last year
- The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"☆39Updated this week
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆118Updated last week
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆52Updated 4 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆99Updated last week
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆31Updated last year
- ☆32Updated 3 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆40Updated last year
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More☆31Updated 2 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆39Updated 3 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆86Updated last month
- ☆75Updated last week
- Recent Advances on MLLM's Reasoning Ability☆24Updated 3 months ago
- ☆18Updated 4 months ago
- Open-Pandora: On-the-fly Control Video Generation☆34Updated 7 months ago
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆37Updated 11 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆77Updated 5 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆94Updated last year
- LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters☆35Updated 4 months ago
- Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…☆49Updated last year
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆25Updated last month
- ☆88Updated last month
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs☆11Updated 2 weeks ago
- Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)☆17Updated last year
- Kinetics: Rethinking Test-Time Scaling Laws☆65Updated last week