codezakh / SelTDALinks

[CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

☆16

Alternatives and similar repositories for SelTDA

Users that are interested in SelTDA are comparing it to the libraries listed below

Sorting:

yfzhang114 / LLaVA-Align
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…
☆78Updated 3 months ago
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆91Updated last week
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆81Updated last year
ys-zong / VL-ICL
[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
☆56Updated 4 months ago
vinid / neg_clip
NegCLIP.
☆32Updated 2 years ago
MIV-XJTU / FLAME
[CVPR 2025] PyTorch implementation of paper "FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training"
☆28Updated last month
Jiaxuan-Li / EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆51Updated last year
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆28Updated last year
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆35Updated 10 months ago
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆45Updated 10 months ago
palchenli / VL-Instruction-Tuning
☆91Updated last year
JieShibo / MemVP
[ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
☆49Updated last year
heliossun / SQ-LLaVA
Visual self-questioning for large vision-language assistant.
☆41Updated 8 months ago
aimagelab / ReflectiVA
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
☆33Updated 2 months ago
Shengcao-Cao / groundLMM
Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision
☆41Updated 2 months ago
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆34Updated 10 months ago
SooLab / DDCOT
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
☆44Updated last year
mrwu-mac / R-Bench
[ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'
☆21Updated 5 months ago
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆74Updated 11 months ago
allenai / close
☆59Updated last year
eric-ai-lab / GRIT
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
☆64Updated this week
mlvlab / RPO
Official Implementation of "Read-only Prompt Optimization for Vision-Language Few-shot Learning", ICCV 2023
☆53Updated last year
meetdavidwan / crg
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆34Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆86Updated last year
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆60Updated 11 months ago
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆52Updated 7 months ago
UCSC-VLAA / CLIPS
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆34Updated last month
codezakh / LilT
[ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning
☆39Updated last year
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆53Updated 7 months ago
GasolSun36 / MVP
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
☆22Updated 8 months ago