OpenGVLab/LAMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenGVLab/LAMM)

OpenGVLab / LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents

☆317

Alternatives and similar repositories for LAMM

Users that are interested in LAMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
OpenGVLab / Multi-Modality-Arena
View on GitHub
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…
☆564Apr 21, 2024Updated 2 years ago
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆365Jan 14, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆407Mar 18, 2025Updated last year
OpenGVLab / VisionLLM
View on GitHub
VisionLLM Series
☆1,152Feb 27, 2025Updated last year
TencentARC / GVT
View on GitHub
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆59Jun 27, 2023Updated 3 years ago
mightyzau / InfMLLM
View on GitHub
☆19Dec 6, 2023Updated 2 years ago
DirtyHarryLYL / LLM-in-Vision
View on GitHub
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆871Mar 8, 2025Updated last year
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
RLHF-V / RLHF-V
View on GitHub
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆310Sep 11, 2024Updated last year
archiki / RepARe
View on GitHub
☆21Oct 10, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
luogen1996 / LaVIN
View on GitHub
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
☆522Jan 27, 2024Updated 2 years ago
PVIT-official / PVIT
View on GitHub
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆37Sep 19, 2023Updated 2 years ago
X-PLUG / mPLUG-Owl
View on GitHub
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,535Apr 2, 2025Updated last year
mightyzau / RegionBLIP
View on GitHub
☆59Aug 7, 2023Updated 2 years ago
staymylove / 3DMIT
View on GitHub
Code of 3DMIT: 3D MULTI-MODAL INSTRUCTION TUNING FOR SCENE UNDERSTANDING
☆32Jul 26, 2024Updated last year
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆963Aug 5, 2025Updated 11 months ago
pkunlp-icler / PCA-EVAL
View on GitHub
[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
☆107Mar 14, 2024Updated 2 years ago
IranQin / Awesome_World_Model_Papers
View on GitHub
[World-Model-Survey-2024] Paper list and projects for World Model
☆15Oct 31, 2024Updated last year
IranQin / MP5
View on GitHub
[CVPR2024] This is the official implement of MP5
☆105Jun 30, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,709Jun 15, 2026Updated last month
PKU-YuanGroup / MoE-LLaVA
View on GitHub
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,322Jul 15, 2025Updated last year
UCSB-AI / MiniGPT-5
View on GitHub
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
☆867May 8, 2025Updated last year
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year
FreedomIntelligence / MLLM-Bench
View on GitHub
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
☆77Oct 16, 2024Updated last year
UMass-Embodied-AGI / 3D-LLM
View on GitHub
Code for 3D-LLM: Injecting the 3D World into Large Language Models
☆1,209Jun 6, 2024Updated 2 years ago
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆425May 5, 2025Updated last year
Chat-3D / Chat-3D
View on GitHub
Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"
☆57Mar 28, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
zhangzaibin / AD-H
View on GitHub
☆15May 21, 2026Updated last month
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
NExT-ChatV / NExT-Chat
View on GitHub
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
☆253Feb 5, 2024Updated 2 years ago
OpenGVLab / LLaMA-Adapter
View on GitHub
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆5,916Mar 14, 2024Updated 2 years ago
JIA-Lab-research / LLaMA-VID
View on GitHub
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆861Jul 29, 2024Updated last year
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year
VITA-MLLM / Woodpecker
View on GitHub
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
☆649Dec 23, 2024Updated last year