THUDM / LVBench
LVBench: An Extreme Long Video Understanding Benchmark
☆79Updated 5 months ago
Alternatives and similar repositories for LVBench:
Users that are interested in LVBench are comparing it to the libraries listed below
- Official repository of MMDU dataset☆82Updated 4 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆109Updated 2 months ago
- ☆136Updated 2 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆97Updated 2 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆154Updated 3 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆84Updated 6 months ago
- Official implementation of the Law of Vision Representation in MLLMs☆148Updated 2 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆62Updated 5 months ago
- ☆132Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆252Updated 7 months ago
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆98Updated 6 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆33Updated 3 months ago
- ☆59Updated 11 months ago
- A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo☆32Updated 5 months ago
- ☆133Updated 2 weeks ago
- [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models☆268Updated 3 months ago
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆202Updated 11 months ago
- A collection of visual instruction tuning datasets.☆76Updated 10 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆85Updated 4 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆132Updated last week
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆57Updated last year
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆128Updated 2 months ago
- ☆95Updated last year
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆137Updated 4 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆113Updated 8 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)☆196Updated this week
- ☆130Updated 4 months ago
- A Survey on Benchmarks of Multimodal Large Language Models☆83Updated 3 weeks ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆144Updated 4 months ago
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆223Updated 5 months ago