harsh19 / spot-the-diffLinks

EMNLP 2018. Learning to Describe Differences Between Pairs of Similar Images. Harsh Jhamtani, Taylor Berg-Kirkpatrick.

☆66

Alternatives and similar repositories for spot-the-diff

Users that are interested in spot-the-diff are comparing it to the libraries listed below

Sorting:

Seth-Park / RobustChangeCaptioning
Code and dataset release for Park et al., Robust Change Captioning (ICCV 2019)
☆49Updated 2 years ago
intersun / LightningDOT
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT
☆72Updated 2 years ago
airsplay / VisualRelationships
Data of ACL 2019 Paper "Expressing Visual Relationships via Language".
☆62Updated 5 years ago
adobe-research / vaw_dataset
This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in th…
☆67Updated 3 years ago
LuoweiZhou / coco-caption
kdexd/coco-caption@de6f385
☆26Updated 5 years ago
igorbrigadir / DownloadConceptualCaptions
Reliably download millions of images efficiently
☆117Updated 4 years ago
google-research-datasets / Crisscrossed-Captions
Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
☆53Updated 5 years ago
Cuberick-Orion / CIRR
Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
☆122Updated this week
microsoft / UniTAB
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆88Updated 2 years ago
sushizixin / CLIP4IDC
CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)
☆36Updated 2 years ago
allenai / gpv-1
A task-agnostic vision-language architecture as a step towards General Purpose Vision
☆92Updated 4 years ago
LisaAnne / TemporalLanguageRelease
☆43Updated 4 years ago
zerovl / ZeroVL
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
☆45Updated 3 years ago
bearcatt / LaBERT
A length-controllable and non-autoregressive image captioning model.
☆68Updated 4 years ago
mad-red / VSR-guided-CIC
Human-like Controllable Image Captioning with Verb-specific Semantic Roles.
☆36Updated 3 years ago
MikeWangWZHL / VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆115Updated 3 years ago
easonnie / mlp-vil
MLPs for Vision and Langauge Modeling (Coming Soon)
☆27Updated 3 years ago
guilk / VLC
Research code for "Training Vision-Language Transformers from Captions Alone"
☆34Updated 3 years ago
allenai / swig
Situation With Groundings (SWiG) dataset and Joint Situation Localizer (JSL)
☆68Updated 4 years ago
redcaps-dataset / redcaps-downloader
Command-line tool for downloading and extending the RedCaps dataset.
☆49Updated last year
mshukor / ViCHA
[BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"
☆55Updated 3 years ago
alasdairtran / transform-and-tell
[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning
☆92Updated last year
jwehrmann / lavse
Language-Agnostic Visual-Semantic Embeddings (ICCV'19)
☆22Updated 5 years ago
YuanEZhou / Grounded-Image-Captioning
☆64Updated 3 years ago
ChenyunWu / PhraseCutDataset
Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"
☆112Updated 5 years ago
allenai / gpv2
☆32Updated 3 years ago
princetonvisualai / pointingqa
Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"
☆19Updated 3 years ago
DavidMChan / caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
☆42Updated 2 years ago
showlab / Region_Learner
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
☆42Updated 3 years ago
e-bug / cross-modal-ablation
[EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…
☆20Updated 3 years ago