KangJialiang/ViSpec

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/KangJialiang/ViSpec)

KangJialiang / ViSpec

[NeurIPS 2025] Official Implementation of ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding.

☆65

Alternatives and similar repositories for ViSpec

Users that are interested in ViSpec are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Lyn-Lucy / MSD
View on GitHub
☆38Jul 21, 2025Updated last year
zju-jiyicheng / SpecVLM
View on GitHub
[EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
☆48Apr 16, 2026Updated 3 months ago
killthefullmoon / MMSpec
View on GitHub
MMSpec: Benchmarking Speculative Decoding for Vision-Language Models
☆41Jul 2, 2026Updated 3 weeks ago
zju-jiyicheng / LVSpec
View on GitHub
[ACL 2026 Main] See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video …
☆27Jul 4, 2026Updated 3 weeks ago
naimengye / speculative-action
View on GitHub
☆30Mar 9, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
hyx1999 / SAM-Decoding
View on GitHub
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
☆52May 12, 2026Updated 2 months ago
HArmonizedSS / HASS
View on GitHub
Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)
☆56Mar 14, 2025Updated last year
ByteDance-Seed / FlexPrefill
View on GitHub
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆170Oct 13, 2025Updated 9 months ago
wangqinsi1 / 2025-ICML-CoreMatching
View on GitHub
[ICML 2025] CoreMatching: Co-adaptive Sparse Inference Framework for Comprehensive Acceleration of Vision Language Model
☆16May 27, 2025Updated last year
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆211Mar 18, 2026Updated 4 months ago
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆402Apr 22, 2025Updated last year
sgl-project / SpecForge
View on GitHub
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆1,018Updated this week
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,283Jun 27, 2026Updated last month
smart-lty / ParallelSpeculativeDecoding
View on GitHub
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆170Dec 23, 2025Updated 7 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Odysseusq / VLCache
View on GitHub
Official Repo for paper "VLCache: Computing 2% Vision Tokens and Reusing 98% for Vision-Language Inference"
☆16Mar 28, 2026Updated 4 months ago
mit-han-lab / fastrl
View on GitHub
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆174Feb 27, 2026Updated 5 months ago
Theia-4869 / VisPruner
View on GitHub
[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
☆84Jul 1, 2025Updated last year
boschresearch / FedTPG
View on GitHub
Code for the ICLR 2024 paper Federated Text-driven Prompt Generation for Vision-Language Models (https://openreview.net/forum?id=NW31gAyl…
☆27May 7, 2024Updated 2 years ago
w-yibo / VTC-R1
View on GitHub
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning.
☆26Jul 20, 2026Updated last week
LiuXiaoxuanPKU / OSD
View on GitHub
☆68Dec 3, 2024Updated last year
thunlp / FR-Spec
View on GitHub
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
☆55Jul 15, 2025Updated last year
Theia-4869 / CDPruner
View on GitHub
[NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
☆106Sep 20, 2025Updated 10 months ago
AI9Stars / SpecMQuant
View on GitHub
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design
☆23May 29, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
foreverlasting1202 / QuestA
View on GitHub
☆22Jan 2, 2026Updated 6 months ago
xlang-ai / VideoAgentTrek
View on GitHub
The official repo of VideoAgentTrek
☆58Oct 24, 2025Updated 9 months ago
kaist-ina / specedge
View on GitHub
Scalable Edge-Assisted Serving Framework for Interactive LLMs [NeurIPS 2025 Spotlight]
☆28Nov 14, 2025Updated 8 months ago
MAC-AutoML / SpecEyes
View on GitHub
[ECCV 2026🔥] This is the official implementation of our paper "SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception…
☆62Apr 2, 2026Updated 3 months ago
xuyang-liu16 / V2Drop
View on GitHub
[CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Models
☆34May 27, 2026Updated 2 months ago
ZichenWen1 / DART
View on GitHub
[EMNLP 2025 main 🔥] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"
☆121Oct 12, 2025Updated 9 months ago
LINs-lab / DeFT
View on GitHub
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
☆54Jun 17, 2025Updated last year
ShopeeLLM / Spec-RL
View on GitHub
SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
☆66Dec 1, 2025Updated 7 months ago
chenzx921020 / MoEQuant
View on GitHub
☆17Apr 7, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
SuDIS-ZJU / Efficient-LVLMs-Inference
View on GitHub
[ACL 2026 Findings] Living repository for the survey paper “Efficient Inference for Large Vision-Language Models: Bottlenecks, Techniques…
☆26Apr 8, 2026Updated 3 months ago
viridisGreen / EarlyTom
View on GitHub
[CVPR 2026] EarlyTom: Early Token Compression Completes Fast Video Understanding
☆34Jun 22, 2026Updated last month
facebookresearch / ParetoQ
View on GitHub
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆131Oct 15, 2025Updated 9 months ago
guanyilin428 / Dynamic-Speculative-Planning
View on GitHub
☆48Sep 13, 2025Updated 10 months ago
Qualcomm-AI-research / dynamic-sparsity
View on GitHub
☆16Mar 26, 2025Updated last year
EffiVLM-Bench / EffiVLM-Bench
View on GitHub
☆35Jun 3, 2025Updated last year
ylsung / rsq
View on GitHub
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆23Mar 25, 2026Updated 4 months ago