Xiuyuan-Chen / AutoEval-Video
☆35Updated last year
Alternatives and similar repositories for AutoEval-Video:
Users that are interested in AutoEval-Video are comparing it to the libraries listed below
- ☆49Updated last year
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆23Updated 3 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆102Updated last year
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆66Updated last month
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆41Updated 9 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 7 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆31Updated 3 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆27Updated 9 months ago
- ☆70Updated 2 months ago
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆43Updated 3 months ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆90Updated 8 months ago
- LMM solved catastrophic forgetting, AAAI2025☆40Updated 4 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆108Updated last month
- [CVPR'25] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆63Updated last week
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Updated last year
- Official implement of MIA-DPO☆54Updated 2 months ago
- Language Repository for Long Video Understanding☆31Updated 9 months ago
- ☆61Updated last year
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 7 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 9 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆74Updated 2 months ago
- Preference Learning for LLaVA☆41Updated 4 months ago
- ☆132Updated 6 months ago
- ☆73Updated 3 months ago
- ☆72Updated 10 months ago
- ☆37Updated 3 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆37Updated 4 months ago
- Official repo for StableLLAVA☆95Updated last year
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆40Updated last week