Long Context Transfer from Language to Vision
β402Mar 18, 2025Updated 11 months ago
Alternatives and similar repositories for LongVA
Users that are interested in LongVA are comparing it to the libraries listed below
Sorting:
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureβ213Jan 6, 2025Updated last year
- π₯π₯MLVU: Multi-task Long Video Understanding Benchmarkβ241Aug 21, 2025Updated 6 months ago
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsβ1,277Jan 23, 2025Updated last year
- β155Oct 31, 2024Updated last year
- β4,577Sep 14, 2025Updated 5 months ago
- β¨β¨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysisβ731Dec 8, 2025Updated 2 months ago
- π₯π₯First-ever hour scale video understanding modelsβ610Jul 14, 2025Updated 7 months ago
- Official repository for the paper PLLaVAβ676Jul 28, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of LongVUβ423May 8, 2025Updated 9 months ago
- [ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modelingβ510Nov 18, 2025Updated 3 months ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,986Nov 7, 2025Updated 3 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β3,766Nov 28, 2025Updated 3 months ago
- β32Jul 29, 2024Updated last year
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.β74Oct 14, 2024Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β113Jul 27, 2024Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMsβ54Mar 9, 2025Updated 11 months ago
- [ECCV 2024π₯] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"β150Sep 10, 2024Updated last year
- One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasksβ3,707Updated this week
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ686Jan 29, 2025Updated last year
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ346Jul 19, 2024Updated last year
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understandingβ409May 8, 2025Updated 9 months ago
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactionsβ2,921May 26, 2025Updated 9 months ago
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridgesβ83Feb 27, 2025Updated last year
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmarkβ137Jul 9, 2025Updated 7 months ago
- LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)β859Jul 29, 2024Updated last year
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ368Jul 24, 2025Updated 7 months ago
- [ACL 2024 π₯] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capβ¦β1,492Aug 5, 2025Updated 6 months ago
- This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"β271Oct 15, 2025Updated 4 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β129Apr 4, 2025Updated 10 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videosβ46Apr 29, 2024Updated last year
- Eagle: Frontier Vision-Language Models with Data-Centric Strategiesβ929Oct 25, 2025Updated 4 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistantβ69Jun 9, 2024Updated last year
- [ECCV2024] Video Foundation Models & Data for Multimodal Understandingβ2,201Dec 15, 2025Updated 2 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"β506Sep 2, 2024Updated last year
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ603Oct 6, 2024Updated last year
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β154Jun 23, 2025Updated 8 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Inputβ67Aug 30, 2024Updated last year
- Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understandingβ293Aug 5, 2025Updated 6 months ago
- β109Dec 30, 2024Updated last year