EvolvingLMMs-Lab / lmms-evalLinks

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

☆3,255

Alternatives and similar repositories for lmms-eval

Users that are interested in lmms-eval are comparing it to the libraries listed below

Sorting:

PKU-Alignment / align-anything
Align Anything: Training All-modality Model with Feedback
☆4,574Updated 2 months ago
jingyi0000 / VLM_survey
Collection of AWESOME vision-language models for vision tasks
☆2,986Updated 3 weeks ago
Yuliang-Liu / Monkey
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,927Updated 2 weeks ago
open-compass / VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆3,294Updated this week
showlab / Show-o
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,756Updated 2 weeks ago
HITsz-TMG / Uni-MoE
Uni-MoE: Lychee's Large Multimodal Model Family.
☆795Updated last week
EvolvingLMMs-Lab / open-r1-multimodal
A fork to add multimodal model training to open-r1
☆1,416Updated 8 months ago
ModalMinds / MM-EUREKA
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆752Updated 2 months ago
NVlabs / Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
☆890Updated last week
BAAI-DCAI / Bunny
A family of lightweight multimodal models.
☆1,046Updated 11 months ago
TinyLLaVA / TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
☆912Updated 6 months ago
lxtGH / OMG-Seg
Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
☆1,327Updated 3 weeks ago
yaotingwangofficial / Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆873Updated 2 months ago
shenyunhang / APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
☆594Updated last year
ShareGPT4Omni / ShareGPT4Video
[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"
☆1,078Updated last year
showlab / Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
☆885Updated last month
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,250Updated 2 weeks ago
ZiyuGuo99 / Image-Generation-CoT
[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation
☆827Updated 5 months ago
Osilly / Vision-R1
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages …
☆722Updated last month
xiaoachen98 / Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
☆425Updated last year
RLHFlow / RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
☆1,475Updated 6 months ago
Simple-Efficient / RL-Factory
Train your Agent model via our easy and efficient framework
☆1,600Updated this week
hiyouga / EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆3,963Updated this week
zhaochen0110 / Awesome_Think_With_Images
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,073Updated last month
EvolvingLMMs-Lab / Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing imp…
☆3,276Updated last year
luo-junyu / Awesome-Agent-Papers
[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges
☆1,990Updated 3 weeks ago
HJYao00 / Mulberry
[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS
☆1,223Updated last month
TideDra / lmm-r1
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆827Updated 5 months ago
om-ai-lab / OmDet
Real-time and accurate open-vocabulary end-to-end object detection
☆1,344Updated 10 months ago
Zefan-Cai / KVCache-Factory
Unified KV Cache Compression Methods for Auto-Regressive Models
☆1,272Updated 10 months ago