snu-mllab / KVzipView external linksLinks
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆198Updated this week
Alternatives and similar repositories for KVzip
Users that are interested in KVzip are comparing it to the libraries listed below
Sorting:
- Measuring Thinking Efficiency in Reasoning Models - Research Repository☆39Dec 2, 2025Updated 2 months ago
- A simple, "Ollama-like" tool for managing and running GGUF language models from your terminal.☆23Jan 2, 2026Updated last month
- Various LLM Benchmarks☆24Oct 22, 2025Updated 3 months ago
- ☆17Aug 5, 2025Updated 6 months ago
- ☆15Apr 9, 2025Updated 10 months ago
- Clipboard Regex Replace is a lightweight GoLang application that allows you to automatically apply regex-based replacements to your clipb…☆10Jan 20, 2026Updated 3 weeks ago
- ☆25Feb 6, 2026Updated last week
- ☆18Dec 9, 2025Updated 2 months ago
- ☆15Sep 11, 2025Updated 5 months ago
- TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs☆23Sep 21, 2025Updated 4 months ago
- Analyze Reddit posts☆29Feb 27, 2025Updated 11 months ago
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆104Nov 9, 2024Updated last year
- Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.☆18Jan 10, 2025Updated last year
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆54Oct 9, 2025Updated 4 months ago
- Fulloch - The Fully Local Home Voice Assistant☆39Updated this week
- Collection of papers about video-audio understanding☆22Dec 26, 2025Updated last month
- L2E llama2.c on Commodore C-64☆18Feb 22, 2025Updated 11 months ago
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆30Jun 14, 2024Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆117Jan 27, 2026Updated 2 weeks ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆52Oct 18, 2024Updated last year
- Fast, memory-efficient attention column reduction (e.g., sum, mean, max)☆34Updated this week
- ☆31Nov 18, 2025Updated 2 months ago
- ☆12Apr 4, 2024Updated last year
- ☆16Nov 29, 2024Updated last year
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆23Sep 23, 2025Updated 4 months ago
- 🚀 FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GP…☆50Nov 26, 2025Updated 2 months ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Simple node proxy for llama-server that enables MCP use☆17May 10, 2025Updated 9 months ago
- LLMProxy is an intelligent large language model backend routing proxy service.☆22Dec 6, 2025Updated 2 months ago
- Steering LLM Thinking with Budget Guidance☆28Aug 10, 2025Updated 6 months ago
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆43Jul 26, 2024Updated last year
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 8 months ago
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆38Jan 27, 2026Updated 2 weeks ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆60Oct 24, 2025Updated 3 months ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆658Sep 30, 2025Updated 4 months ago
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs☆56Feb 2, 2026Updated last week
- Llama.cpp runner/swapper and proxy that emulates LMStudio / Ollama backends☆51Aug 21, 2025Updated 5 months ago
- Official repository for K-EXAONE built by LG AI Research☆66Feb 6, 2026Updated last week
- A comprehensive and efficient long-context model evaluation framework☆30Updated this week