[CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
β65Feb 25, 2026Updated 2 months ago
Alternatives and similar repositories for STC
Users that are interested in STC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- (NeurIPS 2025 π₯) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"β48Feb 11, 2026Updated 3 months ago
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMsβ59Feb 2, 2026Updated 3 months ago
- MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentationβ45Nov 4, 2025Updated 6 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.β91Dec 24, 2025Updated 4 months ago
- [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ124Updated this week
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- β15Nov 1, 2024Updated last year
- [ICLR 2026] DiMeR: Disentangled Mesh Reconstruction Model with Normal-only Geometry Trainingβ52May 26, 2025Updated 11 months ago
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillationβ32Jun 12, 2025Updated 11 months ago
- β14Jun 16, 2023Updated 2 years ago
- [Awesome] π₯π₯π₯ Latest Papers, Codes and Datasets on Streaming / Online Video Understandingβ236May 7, 2026Updated last week
- Unofficial Scalable-Softmax Is Superior for Attentionβ20May 30, 2025Updated 11 months ago
- β28May 13, 2025Updated last year
- Source code of the paper: Overlapped Trajectory-Enhanced Visual Trackingβ11Sep 3, 2024Updated last year
- ε°pdfεζ彩θ²ει»η½ι¨εοΌδΎΏδΊζε°β11Mar 9, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contextsβ17Apr 2, 2025Updated last year
- The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matchinβ¦β10Feb 9, 2025Updated last year
- (CVPR Workshop Best Paper Award) Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustnβ¦β19Nov 4, 2025Updated 6 months ago
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoningβ38Mar 12, 2026Updated 2 months ago
- [ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extractionβ13Apr 21, 2020Updated 6 years ago
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmarkβ27Apr 4, 2026Updated last month
- β11May 19, 2025Updated last year
- β165Apr 27, 2026Updated 3 weeks ago
- [ICDM 2022] Making Reconstruction-based Method Great Again for Video Anomaly Detection (PyTorch)β40Mar 25, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".β238Mar 29, 2025Updated last year
- Robust Tracking via Mamba-based Context-aware Token Learning (AAAI 2025)β16Nov 6, 2025Updated 6 months ago
- β12Feb 13, 2025Updated last year
- β10Nov 27, 2024Updated last year
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Modelsβ71May 15, 2025Updated last year
- [ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learningβ35Jan 14, 2026Updated 4 months ago
- YAICON 3rd project page - 4D Gaussian for Head Reconstructionβ11Dec 22, 2023Updated 2 years ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reactionβ174Mar 23, 2025Updated last year
- survery of small language modelsβ18Jul 23, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [NeurIPS 2025] FastVID: Dynamic Density Pruning for Fast Video Large Language Modelsβ35Nov 10, 2025Updated 6 months ago
- Official PyTorch Implementation of "Better Source, Better Flow: Learning Condition-Dependent Source Distribution for Flow Matching"β31Mar 1, 2026Updated 2 months ago
- A token pruning method that accelerates ViTs for various tasks while maintaining high performance.β28Jul 21, 2025Updated 9 months ago
- [TCSVT2025] MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Trackingβ24Apr 6, 2025Updated last year
- Repository for SoMeLVLM: A Large Vision Language Model for Social Media Processingβ14Oct 9, 2025Updated 7 months ago
- [ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answersβ19Apr 16, 2024Updated 2 years ago
- β14Dec 12, 2023Updated 2 years ago