Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
β372Mar 19, 2025Updated last year
Alternatives and similar repositories for Awesome_Multimodel_LLM
Users that are interested in Awesome_Multimodel_LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Latest Advances on Multimodal Large Language Modelsβ17,849May 1, 2026Updated last month
- π A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).β1,021Sep 27, 2025Updated 8 months ago
- Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.β507Jun 24, 2025Updated 11 months ago
- Reading list for Multimodal Large Language Modelsβ70Aug 17, 2023Updated 2 years ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteriaβ76Oct 16, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).β1,230Jun 28, 2024Updated last year
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.β759May 21, 2026Updated 2 weeks ago
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-basβ¦β1,418May 11, 2026Updated 3 weeks ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training β¦β77May 7, 2025Updated last year
- Project for SNARE benchmarkβ11Jun 5, 2024Updated 2 years ago
- β491Sep 25, 2024Updated last year
- Efficient Multimodal Large Language Models: A Surveyβ385Apr 29, 2025Updated last year
- β15May 7, 2024Updated 2 years ago
- Collection of AWESOME vision-language models for vision tasksβ3,124Oct 14, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 πβ3,630Apr 20, 2026Updated last month
- β4,687Apr 15, 2026Updated last month
- VisionLLM Seriesβ1,149Feb 27, 2025Updated last year
- β¨First Open-Source R1-like Video-LLM [2025/02/18]β383Feb 23, 2025Updated last year
- Implementation of the Benchmark Approaches for Medical Instructional Video Classification (MedVidCL) and Medical Video Question Answeringβ¦β31Jan 31, 2023Updated 3 years ago
- Large language model of Medical AI, General Medical AI (GMAI)β17Jan 30, 2024Updated 2 years ago
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ11,229Updated this week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.β24,848Aug 12, 2024Updated last year
- open llm for multimodalβ20May 18, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)β43Dec 16, 2025Updated 5 months ago
- π₯π₯π₯ [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.β3,195Mar 28, 2026Updated 2 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ339Jul 17, 2024Updated last year
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Modelsβ156Apr 30, 2024Updated 2 years ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ461Dec 2, 2024Updated last year
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.β844May 14, 2025Updated last year
- This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation β¦β510Mar 18, 2025Updated last year
- β549Nov 7, 2024Updated last year
- A curated list of prompt-based paper in computer vision and vision-language learning.β926Dec 18, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Recent Advances in Vision and Language Pre-training (VLP)β297Jun 6, 2023Updated 3 years ago
- A curated list of awesome Multimodal studies.β337May 13, 2026Updated 3 weeks ago
- A Survey on multimodal learning research.β333Aug 22, 2023Updated 2 years ago
- Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.β36Jan 20, 2024Updated 2 years ago
- Code for ICML 2023 paper "When and How Does Known Class Help Discover Unknown Ones? Provable Understandings Through Spectral Analysis"β14Jun 24, 2023Updated 2 years ago
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"β73Dec 8, 2025Updated 6 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β364Jan 14, 2025Updated last year