facebookresearch / connect-caption-and-traceLinks

A unified framework to jointly model images, text, and human attention traces.

☆79

Alternatives and similar repositories for connect-caption-and-trace

Users that are interested in connect-caption-and-trace are comparing it to the libraries listed below

Sorting:

google / localized-narratives
Localized Narratives
☆86Updated 4 years ago
jimmy646 / violin
Data and code for CVPR 2020 paper: "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"
☆162Updated 5 years ago
bearcatt / LaBERT
A length-controllable and non-autoregressive image captioning model.
☆68Updated 4 years ago
chihyaoma / cyclical-visual-captioning
PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision
☆46Updated 5 years ago
intersun / LightningDOT
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT
☆72Updated 3 years ago
medhini / clip_it
CLIP-It! Language-Guided Video Summarization
☆75Updated 4 years ago
airsplay / vimpac
☆73Updated 3 years ago
tsujuifu / pytorch_violet
A PyTorch implementation of VIOLET
☆140Updated last year
LuoweiZhou / densecap
Dense video captioning in PyTorch
☆41Updated 6 years ago
Deferf / CLIP_Video_Representation
Use CLIP to represent video for Retrieval Task
☆70Updated 4 years ago
jayleicn / recurrent-transformer
[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
☆171Updated 5 years ago
VALUE-Leaderboard / DataRelease
Data Release for VALUE Benchmark
☆30Updated 3 years ago
alasdairtran / transform-and-tell
[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning
☆92Updated last year
jayleicn / TVCaption
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
☆90Updated 2 years ago
YuanEZhou / Grounded-Image-Captioning
☆64Updated 3 years ago
allenai / gpv-1
A task-agnostic vision-language architecture as a step towards General Purpose Vision
☆92Updated 4 years ago
jayleicn / TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
☆130Updated 3 years ago
YuanEZhou / satic
☆26Updated 4 years ago
fawazsammani / show-edit-tell
Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020
☆81Updated 5 years ago
BigRedT / info-ground
Learning phrase grounding from captioned images through InfoNCE bound on mutual information
☆74Updated 5 years ago
facebookresearch / ActivityNet-Entities
A Dataset for Grounded Video Description
☆163Updated 3 years ago
LuoweiZhou / YouCook2-Leaderboard
A one-stop shop for YouCook2 info such as leaderboard and recent advances on (cooking) video retrieval and captioning.
☆40Updated 3 years ago
TheShadow29 / vognet-pytorch
[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
☆70Updated 5 years ago
TheShadow29 / VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
☆62Updated 4 years ago
VALUE-Leaderboard / StarterCode
Starter Code for VALUE benchmark
☆80Updated 3 years ago
igorbrigadir / DownloadConceptualCaptions
Reliably download millions of images efficiently
☆118Updated 4 years ago
gsig / visual-grounding
Project page for "Visual Grounding in Video for Unsupervised Word Translation" CVPR 2020
☆42Updated 5 years ago
antoyang / just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
☆125Updated 2 years ago
MichiganCOG / Video-Grounding-from-Text
Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"
☆47Updated last year
antoine77340 / S3D_HowTo100M
S3D Text-Video model trained on HowTo100M using MIL-NCE
☆200Updated 5 years ago