DirtyHarryLYL/LLM-in-Vision

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DirtyHarryLYL/LLM-in-Vision)

DirtyHarryLYL / LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

☆872

Alternatives and similar repositories for LLM-in-Vision

Users that are interested in LLM-in-Vision are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenGVLab / VisionLLM
View on GitHub
VisionLLM Series
☆1,152Feb 27, 2025Updated last year
ttengwang / Awesome_Prompting_Papers_in_Computer_Vision
View on GitHub
A curated list of prompt-based paper in computer vision and vision-language learning.
☆926Dec 18, 2023Updated 2 years ago
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,950Jul 2, 2026Updated 2 weeks ago
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆963Aug 5, 2025Updated 11 months ago
DirtyHarryLYL / Transformer-in-Vision
View on GitHub
Recent Transformer-based CV and related works.
☆1,345Aug 22, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,251Jun 2, 2026Updated last month
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
Computer-Vision-in-the-Wild / CVinW_Readings
View on GitHub
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
☆1,371Mar 14, 2024Updated 2 years ago
JIA-Lab-research / LISA
View on GitHub
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,661Feb 16, 2025Updated last year
HenryHZY / Awesome-Multimodal-LLM
View on GitHub
Research Trends in LLM-guided Multimodal Learning.
☆356Oct 17, 2023Updated 2 years ago
jianzongwu / Awesome-Open-Vocabulary
View on GitHub
(TPAMI 2024) A Survey on Open Vocabulary Learning
☆997May 12, 2026Updated 2 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,928Aug 12, 2024Updated last year
OpenGVLab / LAMM
View on GitHub
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆317Apr 16, 2024Updated 2 years ago
mlfoundations / open_flamingo
View on GitHub
An open-source framework for training large multimodal models.
☆4,113Aug 31, 2024Updated last year
Yangyi-Chen / Multimodal-AND-Large-Language-Models
View on GitHub
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
☆760May 21, 2026Updated last month
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
yzhuoning / Awesome-CLIP
View on GitHub
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,229Jun 28, 2024Updated 2 years ago
awaisrauf / Awesome-CV-Foundational-Models
View on GitHub
☆550Nov 7, 2024Updated last year
WisconsinAIVision / ViP-LLaVA
View on GitHub
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆338Jul 17, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,686Aug 1, 2024Updated last year
AILab-CVC / VL-GPT
View on GitHub
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
☆86Sep 12, 2024Updated last year
ytongbai / LVM
View on GitHub
☆1,835Jun 28, 2024Updated 2 years ago
jingyi0000 / VLM_survey
View on GitHub
Collection of AWESOME vision-language models for vision tasks
☆3,129Oct 14, 2025Updated 9 months ago
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,708Jun 15, 2026Updated last month
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,604Jan 24, 2024Updated 2 years ago
open-mmlab / Multimodal-GPT
View on GitHub
Multimodal-GPT
☆1,512Jun 4, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
X-PLUG / mPLUG-Owl
View on GitHub
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,535Apr 2, 2025Updated last year
BAAI-DCAI / Visual-Instruction-Tuning
View on GitHub
SVIT: Scaling up Visual Instruction Tuning
☆167Jun 20, 2024Updated 2 years ago
EvolvingLMMs-Lab / Otter
View on GitHub
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing imp…
☆3,424Mar 5, 2024Updated 2 years ago
luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆249Aug 14, 2024Updated last year
allenai / unified-io-2
View on GitHub
☆650Feb 15, 2024Updated 2 years ago
YimingCuiCuiCui / awesome-open-vocabulary-object-detection
View on GitHub
☆57Jan 7, 2023Updated 3 years ago