zju-jiyicheng/SpecVLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zju-jiyicheng/SpecVLM)

zju-jiyicheng / SpecVLM

[EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning

☆34

Alternatives and similar repositories for SpecVLM

Users that are interested in SpecVLM are comparing it to the libraries listed below

Sorting:

AIoT-MLSys-Lab / MEDA
View on GitHub
[NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
☆18Jun 19, 2025Updated 8 months ago
Lou1sM / meaningful_image_complexity
View on GitHub
☆16Mar 24, 2025Updated 11 months ago
junzhang-zj / LoRAM
View on GitHub
[ICLR 2025] Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
☆70Mar 29, 2025Updated 11 months ago
CR400AF-A / SparseMM
View on GitHub
[ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
☆82Jan 17, 2026Updated last month
Visual-AI / PruneVid
View on GitHub
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆67May 15, 2025Updated 9 months ago
FFY0 / AdaKV
View on GitHub
The Official Implementation of Ada-KV [NeurIPS 2025]
☆128Nov 26, 2025Updated 3 months ago
TemporaryLoRA / Block-Attention
View on GitHub
☆43Mar 15, 2025Updated 11 months ago
HYUNJS / STTM
View on GitHub
[ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
☆57Feb 2, 2026Updated last month
FFY0 / DefensiveKV
View on GitHub
Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference
☆22Feb 9, 2026Updated last month
Theia-4869 / CDPruner
View on GitHub
[NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
☆87Sep 20, 2025Updated 5 months ago
SuDIS-ZJU / nlcTables
View on GitHub
☆15Jan 27, 2026Updated last month
G-JWLee / TAMP
View on GitHub
☆13May 15, 2025Updated 9 months ago
hyungjin-chung / VPS
View on GitHub
☆14Sep 11, 2025Updated 5 months ago
ACA-Lab-SJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
bscho333 / ReVisiT
View on GitHub
☆20Nov 21, 2025Updated 3 months ago
Qinying-Liu / Awesome-omni-modal-understanding
View on GitHub
Collection of papers about video-audio understanding
☆22Dec 26, 2025Updated 2 months ago
DoubtedSteam / DyVTE
View on GitHub
The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"
☆18Dec 5, 2024Updated last year
gofreelee / SpaceServe
View on GitHub
☆25Updated this week
kzkadc / regression-tta
View on GitHub
The official implementation of "Test-time Adaptation for Regression by Subspace Alignment" (ICLR 2025).
☆15Jun 6, 2025Updated 9 months ago
AVoCaDO-Captioner / AVoCaDO
View on GitHub
https://avocado-captioner.github.io/
☆31Oct 16, 2025Updated 4 months ago
dongxianzhe / hydrainfer
View on GitHub
a mllm inference engine for academic research
☆19Jan 30, 2026Updated last month
KangJialiang / ViSpec
View on GitHub
[NeurIPS 2025] Official Implementation of ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding.
☆47Jan 28, 2026Updated last month
gszfwsb / AutoGnothi
View on GitHub
Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"
☆24Mar 4, 2025Updated last year
ZichenWen1 / EPIC
View on GitHub
(NeurIPS 2025 🔥) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"
☆46Feb 11, 2026Updated 3 weeks ago
M1n9X / GraphRAG_Lite
View on GitHub
☆16Jul 12, 2024Updated last year
kiaia / GIRAFFE
View on GitHub
Extending context length of visual language models
☆12Dec 18, 2024Updated last year
jtpaulo / dedisbench
View on GitHub
DEDISbench: A disk I/O block-based benchmark for deduplication systems. Unlike other existing benchmarks, written content is generated i…
☆14Jul 22, 2021Updated 4 years ago
ByteDance-Seed / FlexPrefill
View on GitHub
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆161Oct 13, 2025Updated 4 months ago
yuyq96 / TextHawk
View on GitHub
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆66Nov 1, 2024Updated last year
sjtu-zhao-lab / ClusterKV
View on GitHub
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)
☆26Feb 26, 2026Updated last week
SuDIS-ZJU / llm-inference-all-in-one
View on GitHub
☆18Feb 18, 2025Updated last year
AshleyLuo001 / UTANet
View on GitHub
[AAAI 2025] Open-source, End-to-end, Medical Image Segmentation model by Task allociation.
☆31May 22, 2025Updated 9 months ago
DerrickYLJ / LessIsMore
View on GitHub
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
☆29Sep 12, 2025Updated 5 months ago
Rayman96 / CAT
View on GitHub
支持GPU全链路加速的全同态加密（FHE）框架
☆20Apr 18, 2025Updated 10 months ago
zoheth / yan
View on GitHub
Yan (炎) is a high-performance CUDA operator library designed for learning purposes while emphasizing clean code and maximum performance.
☆18Jul 21, 2025Updated 7 months ago
Becomebright / ReKV
View on GitHub
[ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
☆104Nov 4, 2025Updated 4 months ago
xiaoqian-shen / Vgent
View on GitHub
[NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent
☆42Nov 30, 2025Updated 3 months ago
tmkasun / streaming_graph_partitioning
View on GitHub
Streaming Graph Server with partitioning
☆15Aug 17, 2023Updated 2 years ago
lern-to-write / STC
View on GitHub
[CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
☆45Feb 25, 2026Updated last week