Official implementation of "TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization" (Findings of ACL 2025).
☆21Jul 25, 2025Updated 10 months ago
Alternatives and similar repositories for TailorKV
Users that are interested in TailorKV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Residual vector quantization for KV cache compression in large language model☆12Oct 22, 2024Updated last year
- [NAACL 2025] Official Code Repository for the paper "Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval"☆22Jul 13, 2025Updated 10 months ago
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆91Dec 7, 2025Updated 5 months ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆60Nov 20, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆47Nov 25, 2024Updated last year
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆53Dec 17, 2024Updated last year
- InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference☆17Mar 30, 2025Updated last year
- [CVPR 2025] LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant☆30Dec 2, 2025Updated 5 months ago
- ☆13Jan 14, 2026Updated 4 months ago
- ☆53May 13, 2024Updated 2 years ago
- ☆11Oct 31, 2021Updated 4 years ago
- Source code for ComNet paper: Satellite multi-beam multicast support for an efficient community-based CDN☆10Jul 26, 2022Updated 3 years ago
- [ICCV 2025] Implementation of the paper "Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs"☆77Oct 25, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆11Sep 6, 2024Updated last year
- [EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering☆18Oct 31, 2024Updated last year
- ☆12Jul 16, 2020Updated 5 years ago
- Federated Unfolding Learning for CSI Feedback in Distributed Edge Networks☆10Jul 7, 2024Updated last year
- Matlab scripts for the paper "Machine Learning meets Stochastic Geometry: Determinantal Subset Selection for Wireless Networks"☆12May 4, 2019Updated 7 years ago
- [code] "A Markovian Model for Analyzing Opportunistic Request Routing in Wireless Cache Networks" by J. Dinal Herath and Anand Seetharam.…☆10Feb 27, 2019Updated 7 years ago
- ☆314Jul 10, 2025Updated 10 months ago
- LLM-Based Multi-Agent Situation Awareness☆17Apr 7, 2026Updated last month
- ☆12Aug 24, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆18Apr 15, 2025Updated last year
- This is the CUDA GPU implementation + Python interface (using PyTorch) of DCI. The paper can be found at https://arxiv.org/abs/1512.00442…☆13Dec 20, 2023Updated 2 years ago
- Simulation codes for over-the-air federated learning via second-order optimization☆14Jan 27, 2022Updated 4 years ago
- Modular and structured prompt caching for low-latency LLM inference☆113Nov 9, 2024Updated last year
- Artifacts Release: A Case for Stateless Mobile Core Network Functions in Space☆16Aug 16, 2022Updated 3 years ago
- [CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering☆22Sep 21, 2024Updated last year
- Running inference on the ZeroSCROLLS benchmark☆22Apr 18, 2024Updated 2 years ago
- ☆43Oct 11, 2025Updated 7 months ago
- Matlab code associated with the publication "Load Modulation for Backscatter Communication: Channel Capacity and Capacity-Approaching Fin…☆14Nov 15, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- MATLAB codes for "Joint channel estimation and user grouping for massive MIMO systems"☆14May 20, 2022Updated 4 years ago
- This is the code for paper “accelerating communication-efficient federated multi-task learning with personalization and Fairness”. Besid…☆11May 11, 2025Updated last year
- This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.☆237Jul 7, 2025Updated 10 months ago
- 记录Transformer升级的论文笔记☆19Jun 25, 2023Updated 2 years ago
- [ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading☆24Jan 6, 2026Updated 4 months ago
- ☆18Jan 1, 2023Updated 3 years ago
- Programming source codes of M. Shojafar (papers, reports, patents)☆15Mar 15, 2023Updated 3 years ago