passerby233 / Collection-of-Visual-Storytelling-StoryNLPLinks

This repository aims to collect the articles and codes for the Visual Storytelling (VIST) task. VIST is a vision-and-language task. It aims to summarize the idea of a photo stream and tells a story about it (in natural language). Be careful about its difference from the "storytelling with data", which is more related to data visualization.

☆23

Alternatives and similar repositories for Collection-of-Visual-Storytelling-StoryNLP

Users that are interested in Collection-of-Visual-Storytelling-StoryNLP are comparing it to the libraries listed below

Sorting:

YoadTew / zero-shot-video-to-text
☆76Updated 3 years ago
salesforce / ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188Updated 6 months ago
MikeWangWZHL / VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆115Updated 3 years ago
antoyang / just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
☆124Updated 2 years ago
j-min / HiREST
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆107Updated 9 months ago
ShiYaya / emscore
Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"
☆26Updated 3 years ago
jayleicn / VideoLanguageFuturePred
[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction
☆51Updated 3 years ago
Deferf / CLIP_Video_Representation
Use CLIP to represent video for Retrieval Task
☆70Updated 4 years ago
RyanLiut / awesome-diverse-captioning
Some papers about *diverse* image (a few videos) captioning
☆26Updated 2 years ago
TheShadow29 / VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
☆62Updated 4 years ago
jayleicn / TVCaption
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
☆90Updated 2 years ago
mzhaoshuai / CenterCLIP
[SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval.
☆133Updated 3 years ago
google-research-datasets / Video-Timeline-Tags-ViTT
A collection of videos annotated with timelines where each video is divided into segments, and each segment is labelled with a short free…
☆29Updated 3 years ago
terry-r123 / Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
☆112Updated 3 years ago
medhini / Instructional-Video-Summarization
Code for paper, "TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency" ECCV 2022
☆39Updated 2 years ago
bearcatt / LaBERT
A length-controllable and non-autoregressive image captioning model.
☆68Updated 4 years ago
DavidMChan / caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
☆42Updated 2 years ago
Dawn-LX / VidSGG-BIG
Pytorch implementation of our paper Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs, which i…
☆48Updated 2 years ago
yashkant / concat-vqa
Official code for the paper "Contrast and Classify: Training Robust VQA Models" published at ICCV, 2021
☆19Updated 4 years ago
jayleicn / mTVRetrieval
[ACL 2021] mTVR: Multilingual Video Moment Retrieval
☆27Updated 3 years ago
ttengwang / dense-video-captioning-pytorch
Second-place solution to dense video captioning task in ActivityNet Challenge (CVPR 2020 workshop)
☆75Updated 4 years ago
niluthpol / weak_supervised_video_moment
Weakly Supervised Video Moment Retrieval from Text Queries
☆43Updated 5 years ago
papermsucode / mdmmt
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
☆26Updated 4 years ago
Yui010206 / SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆189Updated last year
thunlp / VisualDS
☆25Updated 3 years ago
ethan5437 / PR-VIST
Github repository for Plot and Rework: Modeling Storylines for Visual Storytelling (ACL-IJCNLP2021 Findings)
☆21Updated 3 years ago
jayleicn / TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
☆130Updated 3 years ago
soloist97 / densecap-pytorch
A simplified pytorch version of densecap
☆42Updated 11 months ago
jssprz / video_captioning_datasets
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Pe…
☆130Updated 2 years ago
klauscc / VindLU
☆110Updated 2 years ago