doc-doc/CoVGT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/doc-doc/CoVGT)

doc-doc / CoVGT

Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)

☆20

Alternatives and similar repositories for CoVGT

Users that are interested in CoVGT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yl3800 / TranSTR
View on GitHub
☆12Dec 15, 2023Updated 2 years ago
sail-sg / VGT
View on GitHub
Video Graph Transformer for Video Question Answering (ECCV'22)
☆49Jun 8, 2023Updated 3 years ago
doc-doc / HQGA
View on GitHub
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
☆35Sep 17, 2022Updated 3 years ago
showlab / mist
View on GitHub
☆37Dec 20, 2023Updated 2 years ago
doc-doc / NExT-GQA
View on GitHub
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆89Jul 1, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
WHB139426 / GCG
View on GitHub
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]
☆10Jul 22, 2024Updated 2 years ago
jingchenchen / ReasoningConsistency-VQA
View on GitHub
☆13Aug 14, 2022Updated 3 years ago
StanfordVL / atp-video-language
View on GitHub
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…
☆51May 29, 2024Updated 2 years ago
zhiyuanhubj / Long_form_VideoQA
View on GitHub
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
☆18Oct 9, 2024Updated last year
Yui010206 / SeViLA
View on GitHub
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆198Jan 14, 2024Updated 2 years ago
yl3800 / EIGV
View on GitHub
☆15Aug 12, 2022Updated 3 years ago
yl3800 / IGV
View on GitHub
This repo contains code for Invariant Grounding for Video Question Answering
☆27Mar 2, 2023Updated 3 years ago
MGitHubL / TMac
View on GitHub
☆14Feb 26, 2024Updated 2 years ago
HCPLab-SYSU / STKET
View on GitHub
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation (TIP 2024, ACM MM 2023)
☆19Mar 13, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
antoyang / FrozenBiLM
View on GitHub
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆159Dec 9, 2024Updated last year
zhangxi1997 / VQACL
View on GitHub
VQACL: A Novel Visual Question Answering Continual Learning Setting (CVPR'23)
☆45Mar 28, 2024Updated 2 years ago
doc-doc / NExT-QA
View on GitHub
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
☆189Aug 2, 2025Updated 11 months ago
thaolmk54 / hcrn-videoqa
View on GitHub
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
☆135Jul 25, 2024Updated last year
bigai-nlco / VideoTGB
View on GitHub
[EMNLP 2024] A Video Chat Agent with Temporal Prior
☆33Mar 2, 2025Updated last year
ahmedssabir / Belief-Revision-Score
View on GitHub
Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022
☆11Apr 13, 2025Updated last year
JonghwanMun / MarioQA
View on GitHub
Repository for MarioQA: Answering Questions by Watching Gameplay Videos in ICCV 2017
☆10Oct 28, 2025Updated 8 months ago
makarandtapaswi / MovieQA_CVPR2016
View on GitHub
Contains approaches introduced in the MovieQA benchmark dataset paper
☆78Nov 30, 2016Updated 9 years ago
Zhiquan-Wen / D-VQA
View on GitHub
PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)
☆26Oct 13, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
VRU-NExT / VideoQA
View on GitHub
☆104Oct 19, 2022Updated 3 years ago
JerryYLi / svitt
View on GitHub
Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"
☆21Jun 16, 2023Updated 3 years ago
BaoBaoGitHub / Hungyi_Lee_Machine_Learning_2021
View on GitHub
李宏毅机器学习2021笔记
☆14Nov 27, 2022Updated 3 years ago
QiQAng / UEDVC
View on GitHub
☆12May 26, 2023Updated 3 years ago
mlvlab / Flipped-VQA
View on GitHub
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
☆77Mar 26, 2025Updated last year
boheumd / MA-LMM
View on GitHub
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆350Jul 19, 2024Updated 2 years ago
dair-iitd / nsrmp
View on GitHub
NSRM: Neuro-Symbolic Robot Manipulation
☆18Jul 11, 2023Updated 3 years ago
lianshiwei / datavisualization.github.io
View on GitHub
中国历年GDP和人口数据可视化
☆13Jan 18, 2023Updated 3 years ago
MILVLG / bottom-up-attention.pytorch
View on GitHub
A PyTorch reimplementation of bottom-up-attention models
☆301Apr 7, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
wdrink / STTS
View on GitHub
Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.
☆52Jul 13, 2022Updated 4 years ago
sukrutrao / Model-Guidance
View on GitHub
Code for the paper: Studying How to Efficiently and Effectively Guide Models with Explanations. ICCV 2023.
☆19Nov 1, 2023Updated 2 years ago
UARK-AICV / VLTinT
View on GitHub
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
☆68Feb 16, 2024Updated 2 years ago
quqxui / MMRNS
View on GitHub
Code of the paper Relation-enhanced Negative Sampling for Multimodal Knowledge Graph Completion (ACM MM22))
☆34May 22, 2024Updated 2 years ago
SummerRaining / videoqa_keras
View on GitHub
videoqa,天池江之杯视频问答比赛
☆13Dec 19, 2018Updated 7 years ago
tho-kn / Ego3DPose
View on GitHub
Official repository of the "Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views" (SIGGRAPH Asia 2023)
☆10Dec 24, 2024Updated last year
imagegridworth / IG-VLM
View on GitHub
☆138Sep 29, 2024Updated last year