[EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
β34Jan 11, 2026Updated last month
Alternatives and similar repositories for SpecVLM
Users that are interested in SpecVLM are comparing it to the libraries listed below
Sorting:
- [NAACL 2025π₯] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inferenceβ18Jun 19, 2025Updated 8 months ago
- β16Mar 24, 2025Updated 11 months ago
- [ICLR 2025] Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Modelsβ70Mar 29, 2025Updated 11 months ago
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMsβ82Jan 17, 2026Updated last month
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Modelsβ67May 15, 2025Updated 9 months ago
- The Official Implementation of Ada-KV [NeurIPS 2025]β128Nov 26, 2025Updated 3 months ago
- β43Mar 15, 2025Updated 11 months ago
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMsβ57Feb 2, 2026Updated last month
- Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inferenceβ22Feb 9, 2026Updated last month
- [NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.β87Sep 20, 2025Updated 5 months ago
- β15Jan 27, 2026Updated last month
- β13May 15, 2025Updated 9 months ago
- β14Sep 11, 2025Updated 5 months ago
- β13Jan 7, 2025Updated last year
- β20Nov 21, 2025Updated 3 months ago
- Collection of papers about video-audio understandingβ22Dec 26, 2025Updated 2 months ago
- The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"β18Dec 5, 2024Updated last year
- β25Updated this week
- The official implementation of "Test-time Adaptation for Regression by Subspace Alignment" (ICLR 2025).β15Jun 6, 2025Updated 9 months ago
- https://avocado-captioner.github.io/β31Oct 16, 2025Updated 4 months ago
- a mllm inference engine for academic researchβ19Jan 30, 2026Updated last month
- [NeurIPS 2025] Official Implementation of ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding.β47Jan 28, 2026Updated last month
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"β24Mar 4, 2025Updated last year
- (NeurIPS 2025 π₯) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"β46Feb 11, 2026Updated 3 weeks ago
- β16Jul 12, 2024Updated last year
- Extending context length of visual language modelsβ12Dec 18, 2024Updated last year
- DEDISbench: A disk I/O block-based benchmark for deduplication systems. Unlike other existing benchmarks, written content is generated iβ¦β14Jul 22, 2021Updated 4 years ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ161Oct 13, 2025Updated 4 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Modelsβ66Nov 1, 2024Updated last year
- ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)β26Feb 26, 2026Updated last week
- β18Feb 18, 2025Updated last year
- [AAAI 2025] Open-source, End-to-end, Medical Image Segmentation model by Task allociation.β31May 22, 2025Updated 9 months ago
- Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoningβ29Sep 12, 2025Updated 5 months ago
- ζ―ζGPUε ¨ιΎθ·―ε ιηε ¨εζε ε―οΌFHEοΌζ‘ζΆβ20Apr 18, 2025Updated 10 months ago
- Yan (η) is a high-performance CUDA operator library designed for learning purposes while emphasizing clean code and maximum performance.β18Jul 21, 2025Updated 7 months ago
- [ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrievalβ104Nov 4, 2025Updated 4 months ago
- [NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgentβ42Nov 30, 2025Updated 3 months ago
- Streaming Graph Server with partitioningβ15Aug 17, 2023Updated 2 years ago
- [CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compressionβ45Feb 25, 2026Updated last week