OpenGVLab / VisionLLMLinks

VisionLLM Series

☆1,130

Alternatives and similar repositories for VisionLLM

Users that are interested in VisionLLM are comparing it to the libraries listed below

Sorting:

mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆929Updated 4 months ago
shikras / shikra
☆799Updated last year
jshilong / GPT4RoI
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆549Updated 6 months ago
PKU-YuanGroup / LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
☆852Updated last year
OpenGVLab / all-seeing
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆503Updated last year
LLaVA-VL / LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
☆763Updated last year
csuhan / OneLLM
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
☆665Updated last year
TinyLLaVA / TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
☆929Updated 7 months ago
SkyworkAI / Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
☆577Updated last year
rese1f / MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
☆670Updated 10 months ago
DirtyHarryLYL / LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆874Updated 9 months ago
baaivision / tokenize-anything
[ECCV 2024] Tokenize Anything via Prompting
☆600Updated 11 months ago
baaivision / Emu
Emu Series: Generative Multimodal Models from BAAI
☆1,760Updated last year
dvlab-research / LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆851Updated last year
IDEA-Research / Grounding-DINO-1.5-API
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
☆1,064Updated 10 months ago
BAAI-DCAI / Bunny
A family of lightweight multimodal models.
☆1,047Updated last year
SunzeY / AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
☆854Updated 4 months ago
Computer-Vision-in-the-Wild / CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
☆1,351Updated last year
thunlp / LLaVA-UHD
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆399Updated last week
dvlab-research / LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,502Updated 9 months ago
microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆567Updated this week
penghao-wu / vstar
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆681Updated last year
Meituan-AutoML / MobileVLM
Strong and Open Vision Language Assistant for Mobile Devices
☆1,308Updated last year
OpenGVLab / InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
☆2,125Updated this week
OpenGVLab / Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…
☆552Updated last year
awaisrauf / Awesome-CV-Foundational-Models
☆540Updated last year
facebookresearch / MetaCLIP
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,765Updated last week
allenai / unified-io-2
☆632Updated last year
DAMO-NLP-SG / VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
☆1,255Updated 10 months ago
IDEA-Research / OpenSeeD
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
☆741Updated last year