dvlab-research / LSDBenchLinks
A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency of long-video VLMs.
☆15Updated 2 months ago
Alternatives and similar repositories for LSDBench
Users that are interested in LSDBench are comparing it to the libraries listed below
Sorting:
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆20Updated 2 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆49Updated this week
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆34Updated 6 months ago
- ☆36Updated last month
- Official PyTorch Code of ReKV (ICLR'25)☆23Updated 2 months ago
- ☆81Updated 2 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆53Updated 2 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆85Updated 9 months ago
- Official code for MotionBench (CVPR 2025)☆40Updated 3 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆60Updated 11 months ago
- ☆17Updated last month
- The offical implemention of JM3D.☆30Updated last month
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆33Updated last month
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆41Updated this week
- VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation☆26Updated 8 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 11 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆18Updated 7 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆29Updated last month
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆53Updated last week
- ☆30Updated 4 months ago
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆29Updated 2 months ago
- ☆25Updated last month
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆45Updated 4 months ago
- Official Repository of Personalized Visual Instruct Tuning☆28Updated 3 months ago
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆32Updated last month
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆26Updated 2 months ago
- ☆33Updated 4 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆75Updated 7 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆31Updated 8 months ago