aimagelab / speakseeLinks

PyTorch library for Visual-Semantic tasks

☆29

Alternatives and similar repositories for speaksee

Users that are interested in speaksee are comparing it to the libraries listed below

Sorting:

jayleicn / TVQA
[EMNLP 2018] PyTorch code for TVQA: Localized, Compositional Video Question Answering
☆181Updated 3 years ago
LuoweiZhou / densecap
Dense video captioning in PyTorch
☆41Updated 6 years ago
facebookresearch / ActivityNet-Entities
A Dataset for Grounded Video Description
☆163Updated 3 years ago
chihyaoma / cyclical-visual-captioning
PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision
☆46Updated 5 years ago
jz462 / Large-Scale-VRD.pytorch
Implementation for the AAAI2019 paper "Large-scale Visual Relationship Understanding"
☆146Updated 6 years ago
niluthpol / multimodal_vtt
Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval
☆68Updated 5 years ago
LuoweiZhou / ProcNets-YouCook2
Source code for paper "Towards Automatic Learning of Procedures from Web Instructional Videos"
☆34Updated 6 years ago
lichengunc / pretrain-vl-data
Pre-trained V+L Data Preparation
☆46Updated 5 years ago
ruotianluo / GoogleConceptualCaptioning
☆54Updated 5 years ago
yj-yu / lsmdc
☆33Updated 7 years ago
nocaps-org / updown-baseline
Baseline model for nocaps benchmark, ICCV 2019 paper "nocaps: novel object captioning at scale".
☆76Updated 2 years ago
jayleicn / TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
☆130Updated 3 years ago
ranjaykrishna / densevid_eval
Evaluation code for Dense-Captioning Events in Videos
☆129Updated 6 years ago
fawazsammani / show-edit-tell
Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020
☆81Updated 5 years ago
salesforce / densecap
☆191Updated 5 months ago
hassanhub / MultiGrounding
This is the repo for Multi-level textual grounding
☆34Updated 5 years ago
yuleiniu / rva
Code for CVPR'19 "Recursive Visual Attention in Visual Dialog"
☆64Updated 2 years ago
antoine77340 / Mixture-of-Embedding-Experts
Mixture-of-Embeddings-Experts
☆120Updated 5 years ago
ruotianluo / Transformer_Captioning
Use transformer for captioning
☆156Updated 6 years ago
MichiganCOG / Video-Grounding-from-Text
Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"
☆46Updated last year
jimmy646 / violin
Data and code for CVPR 2020 paper: "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"
☆162Updated 5 years ago
aimbrain / vqa-project
Code for our paper: Learning Conditioned Graph Structures for Interpretable Visual Question Answering
☆150Updated 6 years ago
jayleicn / TVCaption
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
☆90Updated 2 years ago
VisionLearningGroup / Text-to-Clip_Retrieval
Implementation for "Multilevel Language and Vision Integration for Text-to-Clip Retrieval"
☆49Updated 6 years ago
XiangChenchao / DDPN
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
☆23Updated 7 years ago
jayleicn / recurrent-transformer
[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
☆171Updated 4 years ago
richardaecn / cvpr18-caption-eval
Learning to Evaluate Image Captioning. CVPR 2018
☆85Updated 7 years ago
Cadene / murel.bootstrap.pytorch
MUREL (CVPR 2019), a multimodal relational reasoning module for VQA
☆195Updated 5 years ago
GriffinLiang / vrd-dsr
Code for Visual Relationship Detection with Deep Structural Ranking (AAAI2018)
☆122Updated 5 years ago
shijx12 / XNM-Net
Pytorch implementation of "Explainable and Explicit Visual Reasoning over Scene Graphs "
☆93Updated 6 years ago