HYUNJS / STTMLinks
[ICCV-2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
☆34Updated 2 weeks ago
Alternatives and similar repositories for STTM
Users that are interested in STTM are comparing it to the libraries listed below
Sorting:
- HoliTom: Holistic Token Merging for Fast Video Large Language Models☆39Updated 2 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆48Updated last month
- Official PyTorch Code of ReKV (ICLR'25)☆38Updated 5 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆131Updated 7 months ago
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆28Updated 2 months ago
- The official repository for ACL2025 paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆51Updated 2 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆64Updated 4 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆151Updated 2 months ago
- ICML2025☆52Updated this week
- [ECCV 2024] VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement☆33Updated last year
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆81Updated this week
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆65Updated 9 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆68Updated last month
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆169Updated last week
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆105Updated 4 months ago
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆76Updated 3 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆31Updated 2 weeks ago
- [NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis☆24Updated 8 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆90Updated 3 months ago
- Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understan…☆36Updated 6 months ago
- Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting☆48Updated last month
- ☆27Updated 4 months ago
- Transactions on Multimedia (TMM25)☆15Updated 4 months ago
- Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation☆52Updated 2 months ago
- CODA: Repurposing Continuous VAEs for Discrete Tokenization☆24Updated last month
- Official implementation of "STAR: Scale-wise Text-to-image generation via Auto-Regressive representations"☆36Updated 5 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆45Updated 2 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆35Updated last month
- Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]☆72Updated last month
- ☆44Updated 10 months ago