facebookresearch / selective-vqa_oodLinks

Implementation for the CVPR 2023 paper "Improving Selective Visual Question Answering by Learning from Your Peers" (https://arxiv.org/abs/2306.08751)

☆25

Alternatives and similar repositories for selective-vqa_ood

Users that are interested in selective-vqa_ood are comparing it to the libraries listed below

Sorting:

jeykigung / HiCLIP
☆29Updated 2 years ago
gregor-ge / mBLIP
☆86Updated last year
kirill-vish / Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
☆101Updated 10 months ago
apple / ml-tic-clip
Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".
☆102Updated last year
csarron / PuMer
[ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
☆32Updated 10 months ago
facebookresearch / HierVL
[CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings
☆46Updated last year
iancovert / locality-alignment
☆51Updated 6 months ago
bfshi / TOAST
Official code for "TOAST: Transfer Learning via Attention Steering"
☆188Updated 2 years ago
htqin / GoogleBard-VisUnderstand
How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges
☆30Updated last year
prometheus-eval / prometheus-vision
[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…
☆74Updated 10 months ago
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆145Updated 2 weeks ago
umd-huang-lab / perceptionCLIP
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆77Updated last year
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
microsoft / react
REACT (CVPR 2023, Highlight 2.5%)
☆138Updated 2 years ago
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆36Updated 11 months ago
LAION-AI / General-GPT
☆65Updated last year
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆27Updated 3 months ago
RotsteinNoam / FuseCap
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
☆55Updated last year
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆35Updated last year
FreedomIntelligence / TRIM
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆15Updated 8 months ago
allenai / grit_official
Official repository for the General Robust Image Task (GRIT) Benchmark
☆54Updated 2 years ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆63Updated 3 months ago
sehyunkwon / ICTC
This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)
☆90Updated last year
naver-ai / seit
[ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT
☆55Updated 11 months ago
lucidrains / MaMMUT-pytorch
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch
☆103Updated last year
callsys / TextVR
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆26Updated last year
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆112Updated 6 months ago
OscarXZQ / weight-selection
☆182Updated 10 months ago
naver-ai / augsub
[CVPR 2025] Official PyTorch implementation of MaskSub "Masking meets Supervision: A Strong Learning Alliance"
☆45Updated 4 months ago
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆95Updated last year