jingyi0000 / VLM_surveyLinks

Collection of AWESOME vision-language models for vision tasks

☆2,962

Alternatives and similar repositories for VLM_survey

Users that are interested in VLM_survey are comparing it to the libraries listed below

Sorting:

EvolvingLMMs-Lab / lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆3,199Updated this week
dvlab-research / LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,449Updated 8 months ago
Computer-Vision-in-the-Wild / CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
☆1,339Updated last year
jianzongwu / Awesome-Open-Vocabulary
(TPAMI 2024) A Survey on Open Vocabulary Learning
☆955Updated 7 months ago
lxtGH / OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
☆1,324Updated last week
gokayfem / awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
☆1,041Updated 7 months ago
JindongGu / Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation …
☆491Updated 7 months ago
DirtyHarryLYL / LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆873Updated 7 months ago
keyu-tian / SparK
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for …
☆1,356Updated last year
MasterBin-IIAU / UNINEXT
[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval
☆1,277Updated 2 years ago
open-compass / VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆3,216Updated this week
HarborYuan / ovsam
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
☆1,007Updated 2 months ago
TinyLLaVA / TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
☆910Updated 5 months ago
bytedance / Sa2VA
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
☆1,315Updated this week
OpenGVLab / VisionLLM
VisionLLM Series
☆1,114Updated 7 months ago
shenyunhang / APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
☆594Updated last year
yunlong10 / Awesome-LLMs-for-Video-Understanding
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
☆2,841Updated 2 weeks ago
Yuliang-Liu / Monkey
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,925Updated this week
PKU-Alignment / align-anything
Align Anything: Training All-modality Model with Feedback
☆4,566Updated last month
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆922Updated 2 months ago
uncbiag / Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
☆1,102Updated 4 months ago
2U1 / Qwen-VL-Series-Finetune
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
☆1,270Updated last week
yzhuoning / Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,218Updated last year
SunzeY / AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
☆845Updated 3 months ago
showlab / Show-o
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,742Updated last week
BAAI-DCAI / Bunny
A family of lightweight multimodal models.
☆1,045Updated 11 months ago
DmitryRyumin / ICCV-2023-25-Papers
ICCV 2023-2025 Papers: Discover cutting-edge research from ICCV 2023-25, the leading computer vision conference. Stay updated on the late…
☆952Updated last year
om-ai-lab / OmDet
Real-time and accurate open-vocabulary end-to-end object detection
☆1,343Updated 10 months ago
FoundationVision / GLEE
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
☆1,154Updated last year
PKU-YuanGroup / LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
☆835Updated last year