[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
☆83Feb 27, 2025Updated last year
Alternatives and similar repositories for VideoLLaMB
Users that are interested in VideoLLaMB are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆21Dec 22, 2025Updated 2 months ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated 11 months ago
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆106Nov 28, 2024Updated last year
- ☆20Nov 28, 2024Updated last year
- ☆12Nov 13, 2024Updated last year
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics☆38Sep 10, 2025Updated 5 months ago
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆22Oct 10, 2024Updated last year
- This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"☆273Oct 15, 2025Updated 4 months ago
- [ICML 2025] Official PyTorch implementation of LongVU☆423May 8, 2025Updated 9 months ago
- Long Context Transfer from Language to Vision☆402Mar 18, 2025Updated 11 months ago
- ☆80Nov 24, 2024Updated last year
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"☆13Aug 22, 2025Updated 6 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Jul 10, 2024Updated last year
- ☆28Apr 8, 2025Updated 10 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆40Mar 16, 2025Updated 11 months ago
- [ACL 2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information☆15Oct 27, 2024Updated last year
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆113Jul 27, 2024Updated last year
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆52Dec 5, 2024Updated last year
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆46Apr 29, 2024Updated last year
- Control LLM☆22Apr 6, 2025Updated 10 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆54Mar 9, 2025Updated 11 months ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆151Sep 10, 2024Updated last year
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆55Oct 9, 2025Updated 4 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 8 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆132Nov 19, 2024Updated last year
- This repo holds the implementation of PAVE: Patching and Adapting Video Large Language Models (CVPR2025)☆26Sep 6, 2025Updated 5 months ago
- ☆109Dec 30, 2024Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆41Jan 26, 2026Updated last month
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆82Jul 4, 2025Updated 7 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆181Oct 14, 2024Updated last year
- Unifying Specialized Visual Encoders for Video Language Models☆25Nov 22, 2025Updated 3 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated last week
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆42Feb 5, 2025Updated last year
- VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)☆640Nov 26, 2025Updated 3 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆87Jul 13, 2025Updated 7 months ago
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆122May 19, 2025Updated 9 months ago
- Official repository for the paper PLLaVA☆676Jul 28, 2024Updated last year