A curated list of awesome Multimodal studies.
β325Mar 11, 2026Updated last month
Alternatives and similar repositories for Awesome-Multimodal-Papers
Users that are interested in Awesome-Multimodal-Papers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π Awesome papers on token redundancy reductionβ11Mar 12, 2025Updated last year
- Official Repository of RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuningβ14Jul 9, 2025Updated 10 months ago
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.β758Apr 6, 2026Updated last month
- Latest Advances on Multimodal Large Language Modelsβ17,736May 1, 2026Updated last week
- An open-source implementaion for fine-tuning DINOv2 by Meta.β14Jul 21, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β821Oct 10, 2025Updated 7 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Moβ¦β85Jan 27, 2025Updated last year
- β24Aug 9, 2025Updated 9 months ago
- Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grainβ¦β113Aug 21, 2025Updated 8 months ago
- β21Jul 9, 2025Updated 10 months ago
- β¨β¨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Modelsβ42Apr 10, 2025Updated last year
- π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).β546Apr 4, 2025Updated last year
- Famous Vision Language Models and Their Architecturesβ1,241Jan 11, 2026Updated 3 months ago
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"β72Dec 8, 2025Updated 5 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generationβ23Jul 30, 2025Updated 9 months ago
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understandingβ29Dec 18, 2025Updated 4 months ago
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"β34Jul 12, 2024Updated last year
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"β20Jul 21, 2025Updated 9 months ago
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), httpsβ¦β22Jul 5, 2024Updated last year
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-basβ¦β1,408Apr 19, 2026Updated 3 weeks ago
- β25Nov 17, 2025Updated 5 months ago
- Collection of AWESOME vision-language models for vision tasksβ3,117Oct 14, 2025Updated 6 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]β639Apr 28, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Agentic Keyframe Search for Video Question Answeringβ18Apr 7, 2025Updated last year
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Modβ¦β370Mar 19, 2025Updated last year
- [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking fβ¦β21Dec 4, 2024Updated last year
- β39Updated this week
- [CVPR'24] Official implementation of our paper "Self-Supervised Facial Representation Learning with Facial Region Awareness"β15Mar 8, 2024Updated 2 years ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedbackβ308Sep 11, 2024Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"β33Mar 26, 2025Updated last year
- Efficient Multimodal Large Language Models: A Surveyβ386Apr 29, 2025Updated last year
- β14May 26, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Reading list for research topics in multimodal machine learningβ6,863Aug 20, 2024Updated last year
- MLLM @ Gameβ16May 12, 2025Updated 11 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Modelsβ98Sep 14, 2024Updated last year
- [ICCV 2025] Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning.β55Apr 30, 2026Updated last week
- π A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).β1,016Sep 27, 2025Updated 7 months ago
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing theirβ¦β23Jan 11, 2026Updated 3 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β207Jul 17, 2025Updated 9 months ago