[EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
β34Jan 11, 2026Updated 2 months ago
Alternatives and similar repositories for SpecVLM
Users that are interested in SpecVLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Modelsβ28Mar 18, 2026Updated last week
- [NAACL 2025π₯] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inferenceβ18Jun 19, 2025Updated 9 months ago
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMsβ58Feb 2, 2026Updated last month
- β16Mar 24, 2025Updated last year
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMsβ82Jan 17, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [ICLR 2025] Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Modelsβ74Mar 29, 2025Updated last year
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Modelsβ69May 15, 2025Updated 10 months ago
- β46Mar 15, 2025Updated last year
- Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inferenceβ31Mar 19, 2026Updated last week
- The Official Implementation of Ada-KV [NeurIPS 2025]β128Nov 26, 2025Updated 4 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ104Nov 22, 2025Updated 4 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videosβ121Dec 12, 2025Updated 3 months ago
- [ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Mergingβ46Updated this week
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cacheβ¦β200Nov 17, 2025Updated 4 months ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".β16Sep 15, 2024Updated last year
- The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"β18Dec 5, 2024Updated last year
- β11May 19, 2025Updated 10 months ago
- Official repository for Activation-Informed Merging (AIM) of Large Language Modelsβ22Feb 10, 2025Updated last year
- β14Sep 11, 2025Updated 6 months ago
- β20Nov 21, 2025Updated 4 months ago
- [NeurIPS 2025] FastVID: Dynamic Density Pruning for Fast Video Large Language Modelsβ32Nov 10, 2025Updated 4 months ago
- β35Jun 3, 2025Updated 9 months ago
- β13Jul 3, 2024Updated last year
- NordVPN Threat Protection Proβ’ β’ AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Project Page for GaussianFormerβ24May 30, 2024Updated last year
- Incorporating the memory mechanism into the transformer and employing a parallel weighting structure to obtain a better utterance-level rβ¦β22Oct 4, 2025Updated 5 months ago
- [NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Modelsβ73Oct 10, 2025Updated 5 months ago
- β11Jan 17, 2024Updated 2 years ago
- Extending context length of visual language modelsβ12Dec 18, 2024Updated last year
- β16Jul 12, 2024Updated last year
- The evaluation framework for training-free sparse attention in LLMsβ122Jan 27, 2026Updated 2 months ago
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"β23Mar 4, 2025Updated last year
- https://avocado-captioner.github.io/β32Oct 16, 2025Updated 5 months ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Fast and memory-efficient exact attentionβ21Mar 13, 2026Updated 2 weeks ago
- LLaVA-Next for STVGβ18Dec 5, 2025Updated 3 months ago
- (NeurIPS 2025 π₯) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"β46Feb 11, 2026Updated last month
- β29May 26, 2023Updated 2 years ago
- Implementation for HiPrune, a training-free visual token pruning method for VLM acceleration.β52Mar 20, 2026Updated last week
- β14Jan 20, 2025Updated last year
- Single-header C++20 library to remove recursion using coroutinesβ13Apr 17, 2020Updated 5 years ago