icoz69 / StableLLAVALinks

Official repo for StableLLAVA

☆95

Alternatives and similar repositories for StableLLAVA

Users that are interested in StableLLAVA are comparing it to the libraries listed below

Sorting:

isekai-portal / Link-Context-Learning
☆100Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
YujieLu10 / LLMScore
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
☆134Updated 2 years ago
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆147Updated last year
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆68Updated 7 months ago
dvlab-research / Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
☆155Updated last year
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆104Updated 2 years ago
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆62Updated 9 months ago
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆57Updated 2 years ago
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
☆231Updated 8 months ago
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆76Updated last year
AILab-CVC / VL-GPT
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
☆86Updated last year
Hritikbansal / videocon
☆58Updated last year
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆58Updated last year
UCSC-VLAA / Recap-DataComp-1B
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆143Updated last year
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆158Updated 11 months ago
palchenli / VL-Instruction-Tuning
☆91Updated 2 years ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆90Updated last year
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆72Updated last year
foundation-multimodal-models / CAPTURE
☆80Updated last year
alibaba / conv-llava
☆123Updated last year
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆159Updated last year
showlab / cosmo
☆73Updated last year
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆45Updated 2 years ago
Nicous20 / FunQA
FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …
☆104Updated 11 months ago
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆150Updated 2 months ago
thunlp / Muffin
☆66Updated last year