google-research-datasets / mavericsLinks

MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering (VQA).

☆13

Alternatives and similar repositories for maverics

Users that are interested in maverics are comparing it to the libraries listed below

Sorting:

zinengtang / Perceiver_VL
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
☆33Updated 2 years ago
rowanz / merlot_reserve
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
☆145Updated 3 years ago
naver-ai / eccv-caption
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
☆56Updated last year
kakaobrain / noc
☆47Updated last year
redcaps-dataset / redcaps-downloader
Command-line tool for downloading and extending the RedCaps dataset.
☆50Updated last year
jaeseokbyun / GRIT-VLP
This is an official implementation of GRIT-VLP
☆21Updated 3 years ago
fawazsammani / nlxgpt
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)
☆48Updated last year
j-min / DallEval
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)
☆143Updated 5 months ago
jason9693 / FROZEN
☆14Updated 3 years ago
zinengtang / VidLanKD
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))
☆56Updated 2 years ago
medhini / clip_it
CLIP-It! Language-Guided Video Summarization
☆75Updated 4 years ago
microsoft / LAVENDER
A Unified Framework for Video-Language Understanding
☆60Updated 2 years ago
google / localized-narratives
Localized Narratives
☆86Updated 4 years ago
naver-ai / mid.metric
☆30Updated 2 years ago
guilk / VLC
Research code for "Training Vision-Language Transformers from Captions Alone"
☆34Updated 3 years ago
allenai / gpv-1
A task-agnostic vision-language architecture as a step towards General Purpose Vision
☆92Updated 4 years ago
zinengtang / TVLT
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)
☆124Updated 2 years ago
VALUE-Leaderboard / DataRelease
Data Release for VALUE Benchmark
☆30Updated 3 years ago
LooperXX / ManagerTower
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
☆12Updated 2 months ago
mugen-org / MUGEN_baseline
multimodal video-audio-text generation and retrieval between every pair of modalities on the MUGEN dataset. The repo. contains the traini…
☆40Updated 2 years ago
MIMICLab / L-Verse
L-Verse: Bidirectional Generation Between Image and Text
☆109Updated 7 months ago
MikeWangWZHL / VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆115Updated 3 years ago
google-research-datasets / Crisscrossed-Captions
Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
☆54Updated 5 years ago
TXH-mercury / COSA
[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
☆43Updated 10 months ago
google-research-datasets / videoCC-data
VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…
☆78Updated 2 years ago
ilkerkesen / frozen
A PyTorch implementation of Multimodal Few-Shot Learning with Frozen Language Models with OPT.
☆43Updated 3 years ago
allenai / grit_official
Official repository for the General Robust Image Task (GRIT) Benchmark
☆54Updated 2 years ago
wade3han / champagne
An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"
☆52Updated 2 years ago
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆36Updated last year
naver-ai / NeglectedFreeLunch
☆38Updated 2 years ago