MILVLG/imp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MILVLG/imp)

MILVLG / imp

a family of highly capabale yet efficient large multimodal models

☆194

Alternatives and similar repositories for imp

Users that are interested in imp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xmoanvaf / llava-phi
View on GitHub
☆401Dec 12, 2024Updated last year
TinyLLaVA / TinyLLaVA_Factory
View on GitHub
A Framework of Small-scale Large Multimodal Models
☆992Updated this week
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
Meituan-AutoML / MobileVLM
View on GitHub
Strong and Open Vision Language Assistant for Mobile Devices
☆1,364Apr 15, 2024Updated 2 years ago
Ucas-HaoranWei / Vary-family
View on GitHub
☆57Jan 23, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
bfshi / scaling_on_scales
View on GitHub
When do we not need larger vision models?
☆420Feb 8, 2025Updated last year
BAAI-DCAI / Bunny
View on GitHub
A family of lightweight multimodal models.
☆1,052Nov 18, 2024Updated last year
snap-research / MyVLM
View on GitHub
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
☆188Jul 5, 2024Updated 2 years ago
RUCAIBox / Virgo
View on GitHub
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆110May 27, 2025Updated last year
AIAnytime / Small-Multimodal-Vision-Model
View on GitHub
Small Multimodal Vision Model "Imp-v1-3b" trained using Phi-2 and Siglip.
☆17Feb 5, 2024Updated 2 years ago
Ucas-HaoranWei / Vary-toy
View on GitHub
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
☆630Dec 30, 2024Updated last year
Yangyi-Chen / SOLO
View on GitHub
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆150Nov 14, 2024Updated last year
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
MILVLG / mt-captioning
View on GitHub
A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning
☆25Sep 4, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
X-PLUG / mPLUG-HalOwl
View on GitHub
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
☆100Jan 29, 2024Updated 2 years ago
CircleRadon / TokenPacker
View on GitHub
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
☆278May 26, 2025Updated last year
Meituan-AutoML / VisionLLaMA
View on GitHub
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
☆392Jul 9, 2024Updated 2 years ago
PKU-YuanGroup / MoE-LLaVA
View on GitHub
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,322Jul 15, 2025Updated last year
DLYuanGod / TinyGPT-V
View on GitHub
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
☆1,316Feb 5, 2026Updated 5 months ago
rxtan2 / Koala-video-llm
View on GitHub
☆37Sep 16, 2024Updated last year
MetabrainAGI / Awaker2.5-VL
View on GitHub
☆35Jan 21, 2025Updated last year
SHI-Labs / VisPer-LM
View on GitHub
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
☆73Oct 17, 2025Updated 9 months ago
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated 11 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
TempleX98 / MoVA
View on GitHub
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆174Sep 25, 2024Updated last year
ByungKwanLee / Phantom
View on GitHub
[Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …
☆63Oct 9, 2024Updated last year
UX-Decoder / FIND
View on GitHub
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆132Aug 21, 2024Updated last year
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Jul 6, 2026Updated 2 weeks ago
yuweihao / MM-Vet
View on GitHub
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆329Jan 20, 2025Updated last year
DCDmllm / Momentor
View on GitHub
☆81Nov 24, 2024Updated last year
apple / ml-aim
View on GitHub
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,425Aug 4, 2025Updated 11 months ago
facebookresearch / unibench
View on GitHub
Python Library to evaluate VLM models' robustness across diverse benchmarks
☆227Jun 30, 2026Updated 2 weeks ago
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆963Aug 5, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MILVLG / twigvlm
View on GitHub
Implementation of ICCV 2025 paper "Growing a Twig to Accelerate Large Vision-Language Models".
☆30May 23, 2026Updated last month
MILVLG / rosita
View on GitHub
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
☆57Jun 13, 2023Updated 3 years ago
magic-research / PLLaVA
View on GitHub
Official repository for the paper PLLaVA
☆669Jul 28, 2024Updated last year
YuchuanTian / RethinkTinyLM
View on GitHub
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆126Jan 14, 2025Updated last year
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
zai-org / CogCoM
View on GitHub
☆222Jul 5, 2024Updated 2 years ago
RLHF-V / RLHF-V
View on GitHub
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆310Sep 11, 2024Updated last year