qijimrc / mm_evaluationLinks
☆11Updated last year
Alternatives and similar repositories for mm_evaluation
Users that are interested in mm_evaluation are comparing it to the libraries listed below
Sorting:
- Official github repo of G-LLaVA☆148Updated 9 months ago
- Official repository of MMDU dataset☆98Updated last year
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark☆127Updated 4 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆247Updated last year
- ☆155Updated last year
- ☆156Updated 10 months ago
- 🔥🔥MLVU: Multi-task Long Video Understanding Benchmark☆235Updated 3 months ago
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆25Updated last year
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]☆234Updated 8 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆358Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆59Updated last year
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆299Updated last year
- Long Context Transfer from Language to Vision☆398Updated 8 months ago
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆124Updated 6 months ago
- mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)☆97Updated 2 years ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Updated last year
- Source code for EMNLP2022 long paper: Parameter-Efficient Tuning Makes a Good Classification Head☆14Updated 3 years ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆51Updated 8 months ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆42Updated last year
- Official repo for StableLLAVA☆95Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆317Updated 10 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆598Updated last year
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆293Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆356Updated 10 months ago
- Narrative movie understanding benchmark☆77Updated 5 months ago
- [ICLR 2025] ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation☆126Updated 5 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆298Updated last year
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆169Updated last year
- ☆133Updated last year
- A Large-scale Dataset for training and evaluating model's ability on Dense Text Image Generation☆85Updated 2 months ago