luogen1996/LaVIN

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/luogen1996/LaVIN)

luogen1996 / LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

☆522

Alternatives and similar repositories for LaVIN

Users that are interested in LaVIN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SALT-NLP / LLaVAR
View on GitHub
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆268Jun 12, 2024Updated 2 years ago
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
X-PLUG / mPLUG-Owl
View on GitHub
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,535Apr 2, 2025Updated last year
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
luogen1996 / SimREC
View on GitHub
A lightweight codebase for referring expression comprehension and segmentation
☆57May 21, 2022Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
VPGTrans / VPGTrans
View on GitHub
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
☆269Oct 13, 2023Updated 2 years ago
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
EvolvingLMMs-Lab / Otter
View on GitHub
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing imp…
☆3,425Mar 5, 2024Updated 2 years ago
OpenGVLab / LLaMA-Adapter
View on GitHub
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆5,917Mar 14, 2024Updated 2 years ago
BAAI-DCAI / Visual-Instruction-Tuning
View on GitHub
SVIT: Scaling up Visual Instruction Tuning
☆167Jun 20, 2024Updated 2 years ago
luogen1996 / RepAdapter
View on GitHub
Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".
☆189Apr 18, 2024Updated 2 years ago
mlfoundations / open_flamingo
View on GitHub
An open-source framework for training large multimodal models.
☆4,114Aug 31, 2024Updated last year
mshukor / eP-ALM
View on GitHub
[ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.
☆27Oct 27, 2023Updated 2 years ago
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,255Jun 2, 2026Updated last month
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
open-mmlab / Multimodal-GPT
View on GitHub
Multimodal-GPT
☆1,512Jun 4, 2023Updated 3 years ago
luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆249Aug 14, 2024Updated last year
Alpha-VLLM / LLaMA2-Accessory
View on GitHub
An Open-source Toolkit for LLM Development
☆2,802Jan 13, 2025Updated last year
yuweihao / MM-Vet
View on GitHub
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆329Jan 20, 2025Updated last year
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆215Feb 27, 2024Updated 2 years ago
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,684Aug 1, 2024Updated last year
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,954Jul 2, 2026Updated 2 weeks ago
OpenGVLab / VisionLLM
View on GitHub
VisionLLM Series
☆1,152Feb 27, 2025Updated last year
icoz69 / StableLLAVA
View on GitHub
Official repo for StableLLAVA
☆94Dec 22, 2023Updated 2 years ago
OpenGVLab / LAMM
View on GitHub
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆317Apr 16, 2024Updated 2 years ago
phellonchen / X-LLM
View on GitHub
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
☆318Jul 14, 2026Updated last week
seanzhuh / SeqTR
View on GitHub
SeqTR: A Simple yet Universal Network for Visual Grounding
☆144Oct 30, 2024Updated last year
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
DAMO-NLP-SG / Video-LLaMA
View on GitHub
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
☆3,139Jun 4, 2024Updated 2 years ago
TencentARC / GVT
View on GitHub
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆59Jun 27, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
RUCAIBox / POPE
View on GitHub
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆265Aug 21, 2025Updated 11 months ago
amazon-science / prompt-pretraining
View on GitHub
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
☆259May 3, 2024Updated 2 years ago
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆963Aug 5, 2025Updated 11 months ago
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,932Aug 12, 2024Updated last year
BAAI-DCAI / Bunny
View on GitHub
A family of lightweight multimodal models.
☆1,052Nov 18, 2024Updated last year
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
palchenli / VL-Instruction-Tuning
View on GitHub
☆90Nov 25, 2023Updated 2 years ago