A curated list of awesome Multimodal studies.
β318Mar 11, 2026Updated 2 weeks ago
Alternatives and similar repositories for Awesome-Multimodal-Papers
Users that are interested in Awesome-Multimodal-Papers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π Awesome papers on token redundancy reductionβ11Mar 12, 2025Updated last year
- Official Repository of RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuningβ14Jul 9, 2025Updated 8 months ago
- Latest Advances on Multimodal Large Language Modelsβ17,505Mar 20, 2026Updated last week
- An open-source implementaion for fine-tuning DINOv2 by Meta.β14Jul 21, 2025Updated 8 months ago
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.β755Updated this week
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β807Oct 10, 2025Updated 5 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Moβ¦β86Jan 27, 2025Updated last year
- β21Jul 9, 2025Updated 8 months ago
- β¨β¨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Modelsβ43Apr 10, 2025Updated 11 months ago
- π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).β541Apr 4, 2025Updated 11 months ago
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"β69Dec 8, 2025Updated 3 months ago
- Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generationβ23Jul 30, 2025Updated 8 months ago
- Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grainβ¦β112Aug 21, 2025Updated 7 months ago
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understandingβ29Dec 18, 2025Updated 3 months ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"β34Jul 12, 2024Updated last year
- Famous Vision Language Models and Their Architecturesβ1,210Jan 11, 2026Updated 2 months ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"β21Jul 21, 2025Updated 8 months ago
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), httpsβ¦β22Jul 5, 2024Updated last year
- β25Nov 17, 2025Updated 4 months ago
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-basβ¦β1,389Feb 26, 2026Updated last month
- Collection of AWESOME vision-language models for vision tasksβ3,102Oct 14, 2025Updated 5 months ago
- Agentic Keyframe Search for Video Question Answeringβ16Apr 7, 2025Updated 11 months ago
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Modβ¦β365Mar 19, 2025Updated last year
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking fβ¦β21Dec 4, 2024Updated last year
- [CVPR'24] Official implementation of our paper "Self-Supervised Facial Representation Learning with Facial Region Awareness"β14Mar 8, 2024Updated 2 years ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]β604Mar 23, 2026Updated last week
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"β33Mar 26, 2025Updated last year
- β14May 26, 2023Updated 2 years ago
- MLLM @ Gameβ16May 12, 2025Updated 10 months ago
- Reading list for research topics in multimodal machine learningβ6,845Aug 20, 2024Updated last year
- Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Leβ¦β13Jan 16, 2025Updated last year
- [ICCV 2025] Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning.β53Mar 17, 2026Updated last week
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedbackβ306Sep 11, 2024Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing theirβ¦β21Jan 11, 2026Updated 2 months ago
- Efficient Multimodal Large Language Models: A Surveyβ388Apr 29, 2025Updated 11 months ago
- π A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).β996Sep 27, 2025Updated 6 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β206Jul 17, 2025Updated 8 months ago
- Official Repository for CLRCMD (Appear in ACL2022)β43Feb 21, 2023Updated 3 years ago
- VHTestβ16Oct 31, 2024Updated last year