Atomic-man007 / Awesome_Multimodel_LLMLinks

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

☆339

Alternatives and similar repositories for Awesome_Multimodel_LLM

Users that are interested in Awesome_Multimodel_LLM are comparing it to the libraries listed below

Sorting:

swordlidev / Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey
☆362Updated 3 months ago
jun0wanan / awesome-large-multimodal-agents
☆449Updated 10 months ago
shikiw / OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…
☆354Updated 11 months ago
showlab / Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
☆775Updated 3 weeks ago
yaotingwangofficial / Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆735Updated 3 weeks ago
yfzhang114 / Awesome-Multimodal-Large-Language-Models
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
☆540Updated 3 weeks ago
wangxiao5791509 / MultiModal_BigModels_Survey
[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models
☆289Updated 2 weeks ago
Yangyi-Chen / Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
☆721Updated this week
friedrichor / Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
☆245Updated last week
OpenGVLab / LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆316Updated last year
HenryHZY / Awesome-Multimodal-LLM
Research Trends in LLM-guided Multimodal Learning.
☆357Updated last year
open-compass / MMBench
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
☆236Updated 2 months ago
deepcs233 / Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆360Updated 7 months ago
OpenGVLab / Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…
☆533Updated last year
zjysteven / lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…
☆313Updated 5 months ago
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,058Updated 2 weeks ago
wdndev / mllm_interview_note
主要记录大语言大模型（LLMs）算法（应用）工程师多模态相关知识
☆219Updated last year
MME-Benchmarks / Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆611Updated 2 months ago
Gary-code / Awesome-LVLM-paper
List of papers about Large Multimodal model
☆28Updated 2 months ago
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆287Updated 10 months ago
Osilly / Vision-R1
This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-sta…
☆659Updated 3 weeks ago
tsb0601 / MMVP
☆344Updated last year
zhaochen0110 / Awesome_Think_With_Images
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆777Updated 3 weeks ago
JindongGu / Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation …
☆476Updated 4 months ago
AILab-CVC / SEED-Bench
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆345Updated 6 months ago
Wang-Xiaodong1899 / Open-R1-Video
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆352Updated 5 months ago
ttengwang / Awesome_Long_Form_Video_Understanding
Awesome papers & datasets specifically focused on long-term videos.
☆284Updated 8 months ago
TinyLLaVA / TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
☆864Updated 3 months ago
HC-Guo / Awesome-Multimodal-Chain-of-Thought
Collection of papers and repos for multimodal chain-of-thought
☆85Updated 9 months ago
DAMO-NLP-SG / VCD
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
☆302Updated 9 months ago