Atomic-man007 / Awesome_Multimodel_LLMView external linksLinks
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
β361Mar 19, 2025Updated 10 months ago
Alternatives and similar repositories for Awesome_Multimodel_LLM
Users that are interested in Awesome_Multimodel_LLM are comparing it to the libraries listed below
Sorting:
- Latest Advances on Multimodal Large Language Modelsβ17,337Updated this week
- π A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).β979Sep 27, 2025Updated 4 months ago
- Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.β502Jun 24, 2025Updated 7 months ago
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.β754Jan 22, 2026Updated 3 weeks ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteriaβ72Oct 16, 2024Updated last year
- Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).β1,233Jun 28, 2024Updated last year
- β483Sep 25, 2024Updated last year
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-basβ¦β1,349Dec 7, 2025Updated 2 months ago
- Awesome paper for multi-modal llm with grounding abilityβ19Oct 11, 2025Updated 4 months ago
- β4,552Sep 14, 2025Updated 5 months ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ458Dec 2, 2024Updated last year
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 πβ3,534May 7, 2025Updated 9 months ago
- A Survey on Benchmarks of Multimodal Large Language Modelsβ147Jul 1, 2025Updated 7 months ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training β¦β68May 7, 2025Updated 9 months ago
- This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation β¦β509Mar 18, 2025Updated 10 months ago
- π₯π₯π₯ [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.β3,066Dec 20, 2025Updated last month
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.β840May 14, 2025Updated 8 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ336Jul 17, 2024Updated last year
- β¨First Open-Source R1-like Video-LLM [2025/02/18]β381Feb 23, 2025Updated 11 months ago
- γTMM 2025π₯γ Mixture-of-Experts for Large Vision-Language Modelsβ2,300Jul 15, 2025Updated 6 months ago
- VisionLLM Seriesβ1,137Feb 27, 2025Updated 11 months ago
- Collection of AWESOME vision-language models for vision tasksβ3,075Oct 14, 2025Updated 3 months ago
- β15May 7, 2024Updated last year
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenizationβ582Jun 7, 2024Updated last year
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"β46Dec 1, 2024Updated last year
- Efficient Multimodal Large Language Models: A Surveyβ387Apr 29, 2025Updated 9 months ago
- β547Nov 7, 2024Updated last year
- β48Feb 26, 2025Updated 11 months ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diβ¦β62Nov 7, 2024Updated last year
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Modelsβ42Mar 11, 2025Updated 11 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMsβ98Jan 16, 2025Updated last year
- a family of highly capabale yet efficient large multimodal modelsβ191Aug 23, 2024Updated last year
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyβ477Jan 17, 2025Updated last year
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"β62Dec 8, 2025Updated 2 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ11,166Nov 18, 2024Updated last year
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.β24,446Aug 12, 2024Updated last year
- A curated list of prompt-based paper in computer vision and vision-language learning.β928Dec 18, 2023Updated 2 years ago
- mPLUG-Owl: The Powerful Multi-modal Large Language Model Familyβ2,539Apr 2, 2025Updated 10 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)β42Dec 16, 2025Updated last month