qijimrc / mm_evaluation
☆11Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for mm_evaluation
- Official github repo of G-LLaVA☆121Updated 5 months ago
- LVBench: An Extreme Long Video Understanding Benchmark☆59Updated 2 months ago
- Official repository of MMDU dataset☆74Updated last month
- A light-weight data management system for large-scale pretraining☆20Updated 6 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆26Updated 3 weeks ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆106Updated last month
- ☆121Updated 2 weeks ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆45Updated 2 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆48Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆194Updated 8 months ago
- A RLHF Infrastructure for Vision-Language Models☆98Updated 5 months ago
- ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation☆93Updated 3 months ago
- Video dataset dedicated to portrait-mode video recognition.☆35Updated 7 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆67Updated 4 months ago
- ☆84Updated 10 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆230Updated 2 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆47Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆83Updated 3 weeks ago
- Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆113Updated last month
- Official repo for StableLLAVA☆90Updated 10 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆77Updated 9 months ago
- ☆57Updated 9 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆72Updated 8 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆244Updated 4 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆178Updated 7 months ago
- ☆126Updated last week
- ☆17Updated 7 months ago
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆255Updated 8 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆98Updated 3 weeks ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆37Updated 5 months ago