gicheonkang / gst-visdialLinks

Official PyTorch Implementation for CVPR'23 Paper, "The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training"

☆20

Alternatives and similar repositories for gst-visdial

Users that are interested in gst-visdial are comparing it to the libraries listed below

Sorting:

MikeWangWZHL / Paxion
Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight
☆37Updated 2 years ago
gicheonkang / sglkt-visdial
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"
☆13Updated 2 years ago
MikeWangWZHL / VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆115Updated 2 years ago
sail-sg / VGT
Video Graph Transformer for Video Question Answering (ECCV'22)
☆48Updated 2 years ago
zjuchenlong / WSAG
[EMNLP'22] Weakly-Supervised Temporal Article Grounding
☆14Updated last year
google-deepmind / svo_probes
The SVO-Probes Dataset for Verb Understanding
☆31Updated 3 years ago
ajd12342 / why-winoground-hard
Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022
☆30Updated 2 years ago
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated 2 months ago
microsoft / PICa
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA, AAAI 2022 (Oral)
☆85Updated 3 years ago
aioz-ai / CFR_VQA
Coarse-to-Fine Reasoning for Visual Question Answering (CVPRW'22)
☆45Updated 2 years ago
allenai / aokvqa
Official repository for the A-OKVQA dataset
☆96Updated last year
RAIVNLab / sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆82Updated last year
phellonchen / awesome-visual-dialog
Recent Advances in Visual Dialog
☆30Updated 2 years ago
StanfordVL / atp-video-language
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…
☆51Updated last year
RAIVNLab / CREPE
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆33Updated 2 years ago
szzexpoi / rex
Official Repository for CVPR 2022 paper "REX: Reasoning-aware and Grounded Explanation"
☆22Updated last year
zmykevin / UVLP
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆22Updated 3 years ago
boreng0817 / IFCap
[EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
☆14Updated 2 months ago
layer6ai-labs / SGG-Seq2Seq
Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"
☆43Updated 3 years ago
cambridgeltl / visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆128Updated 2 years ago
szzexpoi / POEM
Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…
☆10Updated last year
JiwanChung / esper
ESPER
☆23Updated last year
limanling / KnowledgeVL-Reading
☆68Updated 2 years ago
Yui010206 / SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆187Updated last year
jayleicn / VideoLanguageFuturePred
[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction
☆50Updated 2 years ago
bcmi / Causal-VidQA
[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…
☆71Updated last month
jialinwu17 / MAVEX
☆30Updated 2 years ago
YYJMJC / Compositional-Temporal-Grounding
☆31Updated 3 years ago
aimagelab / PMA-Net
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023
☆18Updated last year
antoyang / FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆157Updated 8 months ago