WisconsinAIVision / YoLLaVALinks

🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant

☆115

Alternatives and similar repositories for YoLLaVA

Users that are interested in YoLLaVA are comparing it to the libraries listed below

Sorting:

chuangchuangtan / LLaVA-NeXT-Image-Llama3-Lora
LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft
☆44Updated last year
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 7 months ago
yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆103Updated last year
mbzuai-oryx / CVRR-Evaluation-Suite
[CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…
☆50Updated last year
meetdavidwan / crg
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆36Updated last year
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆136Updated 5 months ago
md-mohaiminul / BIMBA
☆26Updated 2 months ago
ExplainableML / flair
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
☆108Updated last month
facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆191Updated last year
Hoar012 / RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
☆71Updated 2 months ago
LALBJ / PAI
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
☆145Updated 11 months ago
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆91Updated last year
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆168Updated last week
HKUST-LongGroup / CoMM
Official repository for CoMM Dataset
☆48Updated 9 months ago
vinid / neg_clip
NegCLIP.
☆35Updated 2 years ago
bronyayang / HallE_Control
HallE-Control: Controlling Object Hallucination in LMMs
☆31Updated last year
Visual-AI / PruneVid
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆54Updated 5 months ago
aimagelab / HySAC
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
☆25Updated 6 months ago
haoyu-bu / CAFe
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆24Updated 6 months ago
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆74Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆177Updated last year
imagegridworth / IG-VLM
☆138Updated last year
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆120Updated 2 months ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆29Updated last year
ys-zong / VL-ICL
[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
☆65Updated 3 weeks ago
snap-research / MyVLM
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
☆179Updated last year
visual-haystacks / mirage
🔥 [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
☆20Updated 8 months ago
amazon-science / QA-ViT
☆70Updated last year
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆57Updated last year
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆189Updated 4 months ago