[CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
☆49Feb 25, 2026Updated 3 weeks ago
Alternatives and similar repositories for STC
Users that are interested in STC are comparing it to the libraries listed below
Sorting:
- [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆67Mar 13, 2026Updated last week
- MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation☆40Nov 4, 2025Updated 4 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆85Dec 24, 2025Updated 2 months ago
- [Awesome] 🔥🔥🔥 Latest Papers, Codes and Datasets on Streaming / Online Video Understanding☆136Jan 13, 2026Updated 2 months ago
- ☆15Nov 1, 2024Updated last year
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆32Jun 12, 2025Updated 9 months ago
- ☆41Dec 20, 2025Updated 3 months ago
- An official implementation for "OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera"☆29Nov 6, 2025Updated 4 months ago
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning☆36Mar 12, 2026Updated last week
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs☆59Feb 2, 2026Updated last month
- Unofficial Scalable-Softmax Is Superior for Attention☆20May 30, 2025Updated 9 months ago
- ☆26May 13, 2025Updated 10 months ago
- Official inference code for SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory☆228Updated this week
- Source code of the paper: Overlapped Trajectory-Enhanced Visual Tracking☆11Sep 3, 2024Updated last year
- 将pdf分成彩色和黑白部分,便于打印☆11Mar 9, 2025Updated last year
- The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matchin…☆10Feb 9, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated 11 months ago
- ☆39Jan 1, 2026Updated 2 months ago
- (CVPR Workshop Best Paper Award) Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustn…☆17Nov 4, 2025Updated 4 months ago
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark☆27Jan 4, 2026Updated 2 months ago
- [ICDM 2022] Making Reconstruction-based Method Great Again for Video Anomaly Detection (PyTorch)☆40Mar 25, 2024Updated last year
- Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".☆238Mar 29, 2025Updated 11 months ago
- Robust Tracking via Mamba-based Context-aware Token Learning (AAAI 2025)☆16Nov 6, 2025Updated 4 months ago
- ☆12Feb 13, 2025Updated last year
- ☆10Nov 27, 2024Updated last year
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆168Mar 23, 2025Updated 11 months ago
- survery of small language models☆18Jul 23, 2024Updated last year
- [NeurIPS 2025] FastVID: Dynamic Density Pruning for Fast Video Large Language Models☆31Nov 10, 2025Updated 4 months ago
- ☆14May 19, 2024Updated last year
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆60Updated this week
- An unofficial implementation using Pytorch for "Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types". Improve the…☆18Nov 17, 2023Updated 2 years ago
- A token pruning method that accelerates ViTs for various tasks while maintaining high performance.☆27Jul 21, 2025Updated 7 months ago
- Code of paper 'Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training'☆21Jun 10, 2025Updated 9 months ago
- Repository for SoMeLVLM: A Large Vision Language Model for Social Media Processing☆13Oct 9, 2025Updated 5 months ago
- ☆17Apr 15, 2025Updated 11 months ago
- [ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answers☆19Apr 16, 2024Updated last year
- Optimizing Monocular Depth Estimation with TensorRT: Model Conversion, Inference Acceleration, and 3D Reconstruction☆40Mar 9, 2026Updated last week
- ☆14Dec 12, 2023Updated 2 years ago
- [EMNLP 2025 main 🔥] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆112Oct 12, 2025Updated 5 months ago