DavidMChan / clairLinks

CLAIR: A (surprisingly) simple semantic text metric with large language models.

☆21

Alternatives and similar repositories for clair

Users that are interested in clair are comparing it to the libraries listed below

Sorting:

allenai / aokvqa
Official repository for the A-OKVQA dataset
☆99Updated last year
vinid / neg_clip
NegCLIP.
☆35Updated 2 years ago
lscpku / VITATECS
☆18Updated last year
SivanDoveh / DAC
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆27Updated last year
UW-Madison-Lee-Lab / CoBSAT
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆41Updated 4 months ago
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆56Updated 2 years ago
Yushi-Hu / tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
☆175Updated last year
RAIVNLab / sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆86Updated last year
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆141Updated 2 weeks ago
allenai / close
☆59Updated 2 years ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated 11 months ago
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆44Updated last year
xichenpan / Kosmos-G
Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models
☆73Updated last year
YujieLu10 / LLMScore
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
☆132Updated last year
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 7 months ago
cambridgeltl / visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆128Updated 2 years ago
showlab / MovieSeq
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆40Updated 7 months ago
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆59Updated last year
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated 4 months ago
MikeWangWZHL / Paxion
Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight
☆37Updated 2 years ago
microsoft / VISOR
☆46Updated last year
TAU-VAILab / hierarcaps
Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)
☆30Updated last year
aimagelab / pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆64Updated 2 months ago
Hritikbansal / entigen_emnlp
How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?
☆13Updated 2 years ago
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆48Updated last year
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
kaistAI / Volcano
[NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…
☆46Updated last year
eslambakr / HRS_benchmark
☆61Updated last year
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆48Updated 7 months ago
google-research-datasets / richhf-18k
RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with t…
☆142Updated last year