swordlidev/Efficient-Multimodal-LLMs-Survey

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/swordlidev/Efficient-Multimodal-LLMs-Survey)

swordlidev / Efficient-Multimodal-LLMs-Survey

Efficient Multimodal Large Language Models: A Survey

☆386

Alternatives and similar repositories for Efficient-Multimodal-LLMs-Survey

Users that are interested in Efficient-Multimodal-LLMs-Survey are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆249Aug 14, 2024Updated last year
Yaxin9Luo / Gamma-MOD
View on GitHub
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆45Oct 28, 2025Updated 8 months ago
swordlidev / Evaluation-Multimodal-LLMs-Survey
View on GitHub
A Survey on Benchmarks of Multimodal Large Language Models
☆156Jul 13, 2026Updated last week
pkunlp-icler / FastV
View on GitHub
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆592Jan 4, 2025Updated last year
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,954Jul 2, 2026Updated 2 weeks ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
daixiangzi / Awesome-Token-Compress
View on GitHub
A paper list of some recent works about Token Compress for Vit and VLM
☆939Updated this week
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
yfzhang114 / SliME
View on GitHub
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆163Dec 26, 2024Updated last year
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
TinyLLaVA / TinyLLaVA_Factory
View on GitHub
A Framework of Small-scale Large Multimodal Models
☆992Updated this week
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Jul 6, 2026Updated 2 weeks ago
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,294Updated this week
UbiquitousLearning / Efficient_Foundation_Model_Survey
View on GitHub
Survey Paper List - Efficient LLM and Foundation Models
☆266Sep 22, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
JieShibo / MemVP
View on GitHub
[ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
☆49May 12, 2024Updated 2 years ago
showlab / Awesome-Unified-Multimodal-Models
View on GitHub
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆828Oct 10, 2025Updated 9 months ago
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated last year
PhoenixZ810 / MG-LLaVA
View on GitHub
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
☆160Sep 27, 2024Updated last year
PKU-YuanGroup / MoE-LLaVA
View on GitHub
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,322Jul 15, 2025Updated last year
shufangxun / LLaVA-MoD
View on GitHub
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
☆227Mar 31, 2025Updated last year
westlake-baichuan-mllm / bc-omni
View on GitHub
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
☆273Jan 27, 2025Updated last year
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆4,326Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
42Shawn / LLaVA-PruMerge
View on GitHub
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆173Mar 8, 2026Updated 4 months ago
mbzuai-oryx / LLaVA-pp
View on GitHub
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
☆842Aug 5, 2025Updated 11 months ago
mrwu-mac / ControlMLLM
View on GitHub
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆210Jul 17, 2025Updated last year
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,710Jun 15, 2026Updated last month
BAAI-DCAI / Bunny
View on GitHub
A family of lightweight multimodal models.
☆1,052Nov 18, 2024Updated last year
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated 2 years ago
CircleRadon / TokenPacker
View on GitHub
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
☆279May 26, 2025Updated last year
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,097Sep 22, 2025Updated 10 months ago
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
SHI-Labs / CuMo
View on GitHub
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆163Jun 8, 2024Updated 2 years ago
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,435May 11, 2026Updated 2 months ago
SkyworkAI / Vitron
View on GitHub
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
☆576Oct 20, 2024Updated last year
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
Yangyi-Chen / Multimodal-AND-Large-Language-Models
View on GitHub
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
☆760May 21, 2026Updated 2 months ago
bfshi / scaling_on_scales
View on GitHub
When do we not need larger vision models?
☆420Feb 8, 2025Updated last year
CharlieDDDD / AISurveyPapers
View on GitHub
Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey
☆21Jul 27, 2025Updated 11 months ago