[CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
β61Feb 25, 2026Updated 2 months ago
Alternatives and similar repositories for STC
Users that are interested in STC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- (NeurIPS 2025 π₯) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"β48Feb 11, 2026Updated 2 months ago
- MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentationβ45Nov 4, 2025Updated 5 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.β88Dec 24, 2025Updated 4 months ago
- [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ124Apr 10, 2026Updated 2 weeks ago
- [ICLR 2026] DiMeR: Disentangled Mesh Reconstruction Model with Normal-only Geometry Trainingβ52May 26, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- β15Nov 1, 2024Updated last year
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillationβ32Jun 12, 2025Updated 10 months ago
- β42Dec 20, 2025Updated 4 months ago
- β14Jun 16, 2023Updated 2 years ago
- [Awesome] π₯π₯π₯ Latest Papers, Codes and Datasets on Streaming / Online Video Understandingβ212Apr 20, 2026Updated last week
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMsβ59Feb 2, 2026Updated 2 months ago
- Unofficial Scalable-Softmax Is Superior for Attentionβ20May 30, 2025Updated 10 months ago
- An official implementation for "OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera"β32Nov 6, 2025Updated 5 months ago
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoningβ37Mar 12, 2026Updated last month
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- β28May 13, 2025Updated 11 months ago
- The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matchinβ¦β10Feb 9, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contextsβ17Apr 2, 2025Updated last year
- (CVPR Workshop Best Paper Award) Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustnβ¦β19Nov 4, 2025Updated 5 months ago
- [ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extractionβ13Apr 21, 2020Updated 6 years ago
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmarkβ27Apr 4, 2026Updated 3 weeks ago
- [ICDM 2022] Making Reconstruction-based Method Great Again for Video Anomaly Detection (PyTorch)β40Mar 25, 2024Updated 2 years ago
- [ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learningβ29Jan 14, 2026Updated 3 months ago
- Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".β238Mar 29, 2025Updated last year
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Robust Tracking via Mamba-based Context-aware Token Learning (AAAI 2025)β16Nov 6, 2025Updated 5 months ago
- β12Feb 13, 2025Updated last year
- β10Nov 27, 2024Updated last year
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Modelsβ71May 15, 2025Updated 11 months ago
- β13Jan 7, 2025Updated last year
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reactionβ172Mar 23, 2025Updated last year
- (ICCV 2025) OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentationβ16Oct 11, 2025Updated 6 months ago
- survery of small language modelsβ18Jul 23, 2024Updated last year
- [NeurIPS 2025] FastVID: Dynamic Density Pruning for Fast Video Large Language Modelsβ34Nov 10, 2025Updated 5 months ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- β14May 19, 2024Updated last year
- Official PyTorch Implementation of "Better Source, Better Flow: Learning Condition-Dependent Source Distribution for Flow Matching"β31Mar 1, 2026Updated last month
- An unofficial implementation using Pytorch for "Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types". Improve theβ¦β18Nov 17, 2023Updated 2 years ago
- A token pruning method that accelerates ViTs for various tasks while maintaining high performance.β27Jul 21, 2025Updated 9 months ago
- Code of paper 'Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training'β21Jun 10, 2025Updated 10 months ago
- [TCSVT2025] MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Trackingβ24Apr 6, 2025Updated last year
- Repository for SoMeLVLM: A Large Vision Language Model for Social Media Processingβ14Oct 9, 2025Updated 6 months ago