SatyamGaba / visual_question_answeringLinks

Visual Question Answering in PyTorch with various Attention Models

☆20

Alternatives and similar repositories for visual_question_answering

Users that are interested in visual_question_answering are comparing it to the libraries listed below

Sorting:

Dong-JinKim / DenseRelationalCaptioning
Code of Dense Relational Captioning
☆69Updated 2 years ago
antoyang / just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
☆123Updated last year
soloist97 / densecap-pytorch
A simplified pytorch version of densecap
☆41Updated 7 months ago
Shivanshu-Gupta / Visual-Question-Answering
CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering
☆75Updated 5 years ago
ntusteeian / VQA_CNN-LSTM
Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…
☆20Updated 4 years ago
tbmoon / basic_vqa
Pytorch VQA : Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf)
☆95Updated last year
GT-RIPL / Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆59Updated 2 years ago
bearcatt / LaBERT
A length-controllable and non-autoregressive image captioning model.
☆68Updated 4 years ago
RoyalSkye / Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
☆78Updated 3 years ago
davidnvq / grit
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
☆194Updated 2 years ago
salaniz / pycocoevalcap
Python 3 support for the MS COCO caption evaluation tools
☆321Updated 11 months ago
ammesatyajit / VideoBERT
Using VideoBERT to tackle video prediction
☆129Updated 4 years ago
saahiluppal / catr
Image Captioning Using Transformer
☆268Updated 3 years ago
medhini / clip_it
CLIP-It! Language-Guided Video Summarization
☆74Updated 4 years ago
jayleicn / TVCaption
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
☆90Updated last year
UARK-AICV / VLTinT
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
☆67Updated last year
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
☆197Updated last year
TheShadow29 / VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
☆60Updated 3 years ago
salesforce / ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188Updated 2 months ago
thaolmk54 / hcrn-videoqa
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
☆133Updated 11 months ago
allenai / gpv-1
A task-agnostic vision-language architecture as a step towards General Purpose Vision
☆92Updated 4 years ago
zarzouram / image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
☆66Updated 2 years ago
zhangxuying1004 / RSTNet
Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)
☆123Updated 2 years ago
facebookresearch / connect-caption-and-trace
A unified framework to jointly model images, text, and human attention traces.
☆78Updated 4 years ago
yikuan8 / Transformers-VQA
An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER
☆164Updated 2 years ago
tgc1997 / Awesome-Video-Captioning
A curated list of research papers in Video Captioning
☆120Updated 4 years ago
yahoo / object_relation_transformer
Implementation of the Object Relation Transformer for Image Captioning
☆178Updated 10 months ago
RachanaJayaram / Cross-Attention-VizWiz-VQA
A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset …
☆15Updated last year
wtliao / ImageTransformer
Image Captioning through Image Transformer
☆40Updated 4 years ago
salesforce / paprika
Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"
☆49Updated 5 months ago