NVlabs / VILA-archive

☆246

Related projects: ⓘ

WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆275Updated 2 months ago
pkunlp-icler / FastV
[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Mo…
☆212Updated last month
bfshi / scaling_on_scales
When do we not need larger vision models?
☆314Updated last month
thunlp / LLaVA-UHD
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
☆298Updated last month
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆293Updated 3 weeks ago
h-zhao1997 / cobra
Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
☆240Updated last month
TRI-ML / prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
☆414Updated 2 months ago
magic-research / PLLaVA
Official repository for the paper PLLaVA
☆551Updated last month
OpenGVLab / all-seeing
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆444Updated last month
AILab-CVC / SEED-Bench
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆303Updated 2 months ago
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆239Updated 2 months ago
tsb0601 / MMVP
☆277Updated 7 months ago
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆138Updated last week
jy0205 / LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆498Updated 2 months ago
OpenGVLab / LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆296Updated 5 months ago
RLHF-V / RLAIF-V
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
☆200Updated last week
OpenGVLab / OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆246Updated 3 weeks ago
Meituan-AutoML / VisionLLaMA
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
☆356Updated 2 months ago
luogen1996 / LLaVA-HR
LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆202Updated last month
RunpeiDong / DreamLLM
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆378Updated 5 months ago
yuweihao / MM-Vet
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆252Updated 3 weeks ago
yfzhang114 / SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆128Updated last month
boheumd / MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆211Updated 2 months ago
BradyFU / Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆365Updated 3 months ago
showlab / videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
☆177Updated last month
OpenGVLab / InternVideo2
☆199Updated 5 months ago
llava-rlhf / LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF
☆302Updated 10 months ago
mbzuai-oryx / VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
☆188Updated last month
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆235Updated 8 months ago
baaivision / EVE
EVE: Encoder-Free Vision-Language Models
☆207Updated last month