HYUNJS / STTMLinks
[ICCV-2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
☆38Updated last month
Alternatives and similar repositories for STTM
Users that are interested in STTM are comparing it to the libraries listed below
Sorting:
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆132Updated 8 months ago
- The official repository for ACL2025 paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆52Updated 3 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆50Updated 2 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆68Updated 10 months ago
- ICML2025☆54Updated this week
- Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]☆73Updated 2 months ago
- HoliTom: Holistic Token Merging for Fast Video Large Language Models☆39Updated 2 weeks ago
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆29Updated 2 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆97Updated last week
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆41Updated 3 weeks ago
- [ECCV 2024] VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement☆33Updated last year
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆79Updated 4 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆72Updated 2 months ago
- Official PyTorch Code of ReKV (ICLR'25)☆42Updated 5 months ago
- VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆56Updated 2 months ago
- ☆21Updated 7 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆190Updated 2 weeks ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆24Updated 2 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆38Updated 2 months ago
- Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understan…☆36Updated 7 months ago
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆89Updated 6 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆158Updated 3 months ago
- ☆22Updated 2 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆105Updated 4 months ago
- CODA: Repurposing Continuous VAEs for Discrete Tokenization☆27Updated 2 months ago
- ☆17Updated 3 weeks ago
- FQGAN: Factorized Visual Tokenization and Generation☆52Updated 5 months ago
- [NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis☆24Updated 9 months ago
- ☆26Updated 5 months ago
- Official implementation of "STAR: Scale-wise Text-to-image generation via Auto-Regressive representations"☆36Updated 5 months ago