Xiuyuan-Chen / AutoEval-Video
☆33Updated 7 months ago
Related projects: ⓘ
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆75Updated 3 weeks ago
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆32Updated 10 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆21Updated 2 months ago
- ☆53Updated 7 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆73Updated 2 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆36Updated 2 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆67Updated 5 months ago
- ☆46Updated 10 months ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆80Updated 2 months ago
- [CVPR2024] This is the official implement of MP5☆72Updated 2 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆39Updated last week
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆99Updated 10 months ago
- Official Dataloader and Evaluation Scripts for LongVideoBench.☆52Updated last month
- Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆44Updated 3 months ago
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning☆93Updated 2 months ago
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆34Updated 2 months ago
- ☆101Updated 5 months ago
- ☆83Updated 9 months ago
- Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆38Updated last year
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆24Updated 3 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated last month
- ☆128Updated 8 months ago
- This is the official implementation of the paper "Needle In A Multimodal Haystack"☆72Updated 2 months ago
- Dense Connector for MLLMs☆98Updated last month
- Official repository of MMDU dataset☆61Updated last month
- This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Co…☆66Updated 2 months ago
- ☆80Updated 4 months ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆45Updated this week
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆44Updated 8 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆112Updated 2 months ago