ilkerkesen / ViLMALinks

ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)

☆16

Alternatives and similar repositories for ViLMA

Users that are interested in ViLMA are comparing it to the libraries listed below

Sorting:

facebookresearch / HierVL
[CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings
☆46Updated 2 years ago
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆44Updated last year
amitakamath / vl_text_encoders_are_bottlenecks
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11Updated 2 years ago
RAIVNLab / CREPE
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆34Updated 2 years ago
k1rezaei / Text-to-concept
☆35Updated last year
jeykigung / HiCLIP
☆30Updated 2 years ago
SMILE-data / SMILE
SMILE: A Multimodal Dataset for Understanding Laughter
☆12Updated 2 years ago
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆35Updated last year
LAION-AI / General-GPT
☆65Updated 2 years ago
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆56Updated 2 years ago
danielchyeh / this-is-my
Official This-Is-My Dataset published in CVPR 2023
☆16Updated last year
bpiyush / TestOfTime
Official code for our CVPR 2023 paper: Test of Time: Instilling Video-Language Models with a Sense of Time
☆46Updated last year
Hritikbansal / videocon
☆57Updated last year
ethanlshen / HierNet
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆21Updated last year
arijitray1993 / COLA
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆24Updated 10 months ago
jmerullo / limber
https://arxiv.org/abs/2209.15162
☆52Updated 2 years ago
naver-ai / prolip
☆53Updated last month
umd-huang-lab / perceptionCLIP
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆79Updated last year
kdariina / CLIP-not-BoW-unimodally
Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"
☆16Updated 7 months ago
facebookresearch / genecis
Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"
☆60Updated 2 years ago
Understanding-Visual-Datasets / VisDiff
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆121Updated last year
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆48Updated 3 months ago
ninatu / howtocaption
Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024
☆55Updated last month
uvavision / SyViC
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆12Updated 2 years ago
SivanDoveh / DAC
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆27Updated last year
jonkahana / CLIPPR
An official PyTorch implementation for CLIPPR
☆29Updated 2 years ago
NVlabs / PALAVRA
☆53Updated 3 years ago
mshukor / eP-ALM
[ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.
☆27Updated last year
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆111Updated 8 months ago
RotsteinNoam / FuseCap
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
☆55Updated last year