noagarcia / ArtVQA
AQUA dataset and VIKING model for the task of Art Visual Question Answering
☆22Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for ArtVQA
- Command-line tool for downloading and extending the RedCaps dataset.☆45Updated 11 months ago
- [EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…☆20Updated 2 years ago
- source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT☆73Updated 2 years ago
- This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in th…☆63Updated 2 years ago
- kdexd/coco-caption@de6f385☆26Updated 4 years ago
- ECCV2020 paper: Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Code and Data.☆85Updated last year
- Official implementation of the Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) | ICCV 2021 - Image Retrieval o…☆36Updated 4 months ago
- ☆25Updated 3 years ago
- Official code for the paper "Contrast and Classify: Training Robust VQA Models" published at ICCV, 2021☆19Updated 3 years ago
- ☆32Updated 2 years ago
- Data of ACL 2019 Paper "Expressing Visual Relationships via Language".☆62Updated 4 years ago
- ☆31Updated 6 years ago
- ☆34Updated last year
- [ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources☆44Updated 2 years ago
- A paper list of visual semantic embeddings and text-image retrieval.☆41Updated 3 years ago
- This dataset contains about 110k images annotated with the depth and occlusion relationships between arbitrary objects. It enables resear…☆16Updated 3 years ago
- This is the repo for Multi-level textual grounding☆33Updated 4 years ago
- [BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"☆54Updated 2 years ago
- A image caption dataset about images from www.dpchallenge.com.☆12Updated 4 years ago
- ☆50Updated 2 years ago
- [CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning☆91Updated 7 months ago
- MDMMT: Multidomain Multimodal Transformer for Video Retrieval☆26Updated 3 years ago
- PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)☆32Updated last year
- A dataset of crowdsourced ratings for machine-generated image captions☆33Updated 5 years ago
- Data Release for VALUE Benchmark☆31Updated 2 years ago
- Official code repository for the EMNLP 2021 paper☆26Updated 2 years ago
- Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)☆47Updated last year
- Code, data, models for the Sherlock corpus☆55Updated 2 years ago
- ☆74Updated 2 years ago
- A large-scale dataset for instance-level recognition for artworks is introduced.☆47Updated last year