StargazerX0 / ScaleKVLinks
ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
☆39Updated last week
Alternatives and similar repositories for ScaleKV
Users that are interested in ScaleKV are comparing it to the libraries listed below
Sorting:
- ☆74Updated 2 weeks ago
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆47Updated 2 months ago
- [CVPR 2025 Highlight] TinyFusion: Diffusion Transformers Learned Shallow☆120Updated 2 months ago
- VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆41Updated last week
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆72Updated last week
- The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"☆24Updated this week
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆104Updated 10 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆60Updated last week
- FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.☆46Updated 10 months ago
- ✈️ Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆67Updated 2 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆24Updated 3 months ago
- Triton implement of bi-directional (non-causal) linear attention☆49Updated 4 months ago
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆160Updated this week
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆21Updated 2 months ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆30Updated 6 months ago
- ☆31Updated last month
- A Collection of Papers on Diffusion Language Models☆60Updated this week
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆51Updated last week
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆69Updated 3 months ago
- IEAP: Image Editing As Programs with Diffusion Models☆53Updated this week
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆31Updated 3 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆102Updated 2 months ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆39Updated 11 months ago
- (ToCa-v2) A New version of ToCa,with faster speed and better acceleration!☆37Updated 2 months ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆60Updated last year
- Data distillation benchmark☆64Updated this week
- Autoregressive Image Generation with Randomized Parallel Decoding☆64Updated 2 months ago
- ☆13Updated 2 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆97Updated 4 months ago
- ☆111Updated last week