Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
β43Updated this week
Alternatives and similar repositories for STC
Users that are interested in STC are comparing it to the libraries listed below
Sorting:
- (NeurIPS 2025 π₯) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"β41Feb 11, 2026Updated 2 weeks ago
- MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentationβ37Nov 4, 2025Updated 3 months ago
- [Awesome] π₯π₯π₯ Latest Papers, Codes and Datasets on Streaming / Online Video Understandingβ96Jan 13, 2026Updated last month
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.β83Dec 24, 2025Updated 2 months ago
- Building a multi-agent RAG system with advanced RAG methodsβ12Jan 12, 2025Updated last year
- A simple exam generator and grader written in Python with OpenCVβ14Jan 14, 2026Updated last month
- [ICLR 2026] DiMeR: Disentangled Mesh Reconstruction Model with Normal-only Geometry Trainingβ51May 26, 2025Updated 9 months ago
- Surrogate Modeling of the Aerodynamic Performance for Transonic Regimeβ13Feb 12, 2024Updated 2 years ago
- β13Jan 7, 2025Updated last year
- β28Jan 5, 2026Updated last month
- Repository for SoMeLVLM: A Large Vision Language Model for Social Media Processingβ13Oct 9, 2025Updated 4 months ago
- mouse pet-ct image segmentationβ12Feb 19, 2023Updated 3 years ago
- β14Dec 12, 2023Updated 2 years ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contextsβ17Apr 2, 2025Updated 10 months ago
- Source code of the paper: Overlapped Trajectory-Enhanced Visual Trackingβ11Sep 3, 2024Updated last year
- A browser based CadQuery serverβ12Feb 18, 2025Updated last year
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoningβ30Dec 17, 2025Updated 2 months ago
- Generate a 3D BIM Model from 2D CAD Drawingsβ12Nov 23, 2022Updated 3 years ago
- [ICME 2025 Oral] The official implementation of TSTMotion in pytorchβ16May 31, 2025Updated 9 months ago
- β10Nov 27, 2024Updated last year
- YAICON 3rd project page - 4D Gaussian for Head Reconstructionβ11Dec 22, 2023Updated 2 years ago
- β41Dec 20, 2025Updated 2 months ago
- [ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videosβ24Aug 8, 2025Updated 6 months ago
- Toy Project: Classification and Detection of representative lung diseases, Lung Opacity and COVID-19, from X-Ray Radiography.β11Sep 26, 2021Updated 4 years ago
- [AAAI2026] CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Trackingβ27Feb 12, 2026Updated 2 weeks ago
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Modelsβ67May 15, 2025Updated 9 months ago
- Official Implementation for our NeurIPS 2024 paper, "Don't Look Twice: Run-Length Tokenization for Faster Video Transformers".β239Mar 29, 2025Updated 11 months ago
- Karras et al. (2022) diffusion models for PyTorchβ17Oct 5, 2023Updated 2 years ago
- YAI 11 x @POZAlabs : Improving & Evaluating Music Generation with ComMUβ13Apr 5, 2023Updated 2 years ago
- (ICCV 2025) OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentationβ14Oct 11, 2025Updated 4 months ago
- (CVPR Workshop Best Paper Award) Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustnβ¦β17Nov 4, 2025Updated 3 months ago
- A survey on MM-LLMs for long video understanding: From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Longβ¦β18Sep 12, 2025Updated 5 months ago
- The official implementation of "Test-time Adaptation for Regression by Subspace Alignment" (ICLR 2025).β14Jun 6, 2025Updated 8 months ago
- β13Feb 5, 2025Updated last year
- [NeurIPS 2025] Official Implementation of ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding.β45Jan 28, 2026Updated last month
- [NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memoryβ140Nov 4, 2025Updated 3 months ago
- project page of "VAD v2: LLM-Like Probabilistic Modeling in End-to-End Autonomous Driving"β11Mar 8, 2024Updated last year
- β23Jan 28, 2026Updated last month
- β28Jan 15, 2026Updated last month