[EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
β44Apr 16, 2026Updated last month
Alternatives and similar repositories for SpecVLM
Users that are interested in SpecVLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Modelsβ30Updated this week
- [NAACL 2025π₯] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inferenceβ20Jun 19, 2025Updated 11 months ago
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMsβ60Feb 2, 2026Updated 3 months ago
- β17Mar 24, 2025Updated last year
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMsβ85Jan 17, 2026Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICLR 2025] Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Modelsβ74Mar 29, 2025Updated last year
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Modelsβ71May 15, 2025Updated last year
- β47Mar 15, 2025Updated last year
- The Official Implementation of Ada-KV [NeurIPS 2025]β131Nov 26, 2025Updated 6 months ago
- [NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Modelsβ80Oct 10, 2025Updated 7 months ago
- β13Nov 15, 2017Updated 8 years ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videosβ127Apr 16, 2026Updated last month
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ111Nov 22, 2025Updated 6 months ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cacheβ¦β206May 1, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".β17Sep 15, 2024Updated last year
- The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"β18Dec 5, 2024Updated last year
- [NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.β101Sep 20, 2025Updated 8 months ago
- ViLoMem: Agentic Learner with Grow-and-Refine Multimodal Semantic Memoryβ64Apr 21, 2026Updated last month
- Official repository for Activation-Informed Merging (AIM) of Large Language Modelsβ23Feb 10, 2025Updated last year
- β10Dec 3, 2024Updated last year
- β13Jan 7, 2025Updated last year
- β16Sep 11, 2025Updated 8 months ago
- β35Jun 3, 2025Updated 11 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- β13Jul 3, 2024Updated last year
- Project Page for GaussianFormerβ24May 30, 2024Updated last year
- Incorporating the memory mechanism into the transformer and employing a parallel weighting structure to obtain a better utterance-level rβ¦β22Oct 4, 2025Updated 7 months ago
- β17Apr 15, 2025Updated last year
- [ACL 2026 Main] Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decodingβ25Nov 21, 2025Updated 6 months ago
- β12Jan 17, 2024Updated 2 years ago
- β13May 15, 2025Updated last year
- Extending context length of visual language modelsβ12Dec 18, 2024Updated last year
- β16Jul 12, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Voronoi-Based Foveated Volume Renderingβ10Sep 30, 2021Updated 4 years ago
- Fast and memory-efficient exact attentionβ21Apr 10, 2026Updated last month
- The official implementation of "Test-time Adaptation for Regression by Subspace Alignment" (ICLR 2025).β16Jun 6, 2025Updated 11 months ago
- LLaVA-Next for STVGβ19Dec 5, 2025Updated 5 months ago
- (NeurIPS 2025 π₯) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"β48Feb 11, 2026Updated 3 months ago
- Layered Multiple Functional Aggregate Optimizationβ17Oct 8, 2020Updated 5 years ago
- [ACL-2026 Findings] Implementation for HiPrune, a training-free visual token pruning method for VLM acceleration.β55Apr 29, 2026Updated last month