inquire-benchmark / INQUIRELinks

This repo contains the evaluation code for the INQUIRE benchmark

☆59

Alternatives and similar repositories for INQUIRE

Users that are interested in INQUIRE are comparing it to the libraries listed below

Sorting:

Understanding-Visual-Datasets / VisDiff
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆129Updated last month
Imageomics / bioclip
This is the repository for the BioCLIP model and the TreeOfLife-10M dataset [CVPR'24 Oral, Best Student Paper].
☆234Updated 5 months ago
naver-ai / prolip
☆55Updated 3 months ago
ninatu / howtocaption
Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024
☆55Updated 3 months ago
tulip-berkeley / open_clip
An open source implementation of CLIP (With TULIP Support)
☆163Updated 6 months ago
RAIVNLab / CREPE
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Updated 2 years ago
TAU-VAILab / hierarcaps
Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)
☆32Updated last year
lukasknobel / SelfCollages
Learning to Count without Annotations
☆23Updated last year
shashankvkt / DoRA_ICLR24
This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …
☆94Updated last year
danielchyeh / this-is-my
Official This-Is-My Dataset published in CVPR 2023
☆16Updated last year
facebookresearch / meru
Code for the paper "Hyperbolic Image-Text Representations", Desai et al, ICML 2023
☆187Updated 2 years ago
ChenDelong1999 / subobjects
Official repository of paper "Subobject-level Image Tokenization" (ICML-25)
☆90Updated 5 months ago
lucas-ventura / CoVR
Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".
☆118Updated last month
aimagelab / awesome-captioning-evaluation
[IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
☆23Updated last week
google-deepmind / perception_test
☆238Updated 6 months ago
mbzuai-oryx / CVRR-Evaluation-Suite
[CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…
☆50Updated last year
google-research / silc
[ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation
☆47Updated last year
NikosEfth / freedom
Official PyTorch implementation of the WACV 2025 Oral paper "Composed Image Retrieval for Training-FREE DOMain Conversion".
☆45Updated 3 months ago
dfan / webssl
Code for Scaling Language-Free Visual Representation Learning (WebSSL)
☆245Updated 7 months ago
WalBouss / GEM
[CVPR24] Official Implementation of GEM (Grounding Everything Module)
☆134Updated 7 months ago
nickjiang2378 / test-time-registers
[NeurIPS '25 Spotlight] Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"
☆145Updated 2 months ago
amitakamath / whatsup_vlms
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
☆67Updated last year
goel-shashank / CyCLIP
☆120Updated 2 years ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆29Updated last year
arijitray1993 / COLA
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25Updated last year
facebookresearch / genecis
Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"
☆61Updated 2 years ago
RAIVNLab / sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆87Updated last year
billpsomas / rscir
Official PyTorch implementation and benchmark dataset for IGARSS 2024 ORAL paper: "Composed Image Retrieval for Remote Sensing"
☆80Updated 11 months ago
facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆194Updated last year
hammoudhasan / SynthCLIP
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆101Updated 8 months ago