BierOne / bottom-up-attention-vqaLinks

An updated PyTorch implementation of hengyuan-hu's version for 'Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering'

☆35

Alternatives and similar repositories for bottom-up-attention-vqa

Users that are interested in bottom-up-attention-vqa are comparing it to the libraries listed below

Sorting:

ezeli / BUTD_model
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
☆48Updated 4 years ago
luo3300612 / image-captioning-DLCT
Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
☆202Updated 3 years ago
232525 / PureT
Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]
☆69Updated last year
ruotianluo / coco-caption
☆68Updated 3 years ago
MILVLG / bottom-up-attention.pytorch
A PyTorch reimplementation of bottom-up-attention models
☆304Updated 3 years ago
CrossmodalGroup / SSL-VQA
Code for our IJCAI2020 paper: Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
☆52Updated 5 years ago
JDAI-CV / image-captioning
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
☆275Updated 4 years ago
MILVLG / mmnas
Deep Multimodal Neural Architecture Search
☆29Updated 5 years ago
YiwuZhong / Sub-GC
[ECCV 2020] Official code for "Comprehensive Image Captioning via Scene Graph Decomposition"
☆99Updated last year
zhangxuying1004 / RSTNet
Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)
☆123Updated 2 years ago
GT-RIPL / Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆60Updated 3 years ago
cshizhe / asg2cap
Code accompanying the paper "Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs" (Chen et al., …
☆200Updated 3 years ago
hobincar / SGN
Official pytorch implementation of the AAAI 2021 paper "Semantic Grouping Network for Video Captioning"
☆54Updated 4 years ago
terry-r123 / Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
☆112Updated 3 years ago
yahoo / object_relation_transformer
Implementation of the Object Relation Transformer for Image Captioning
☆180Updated last year
sks3i / pycocoevalcap
Microsoft COCO Caption Evaluation Tool - Python 3
☆33Updated 6 years ago
yangxuntu / SGAE
☆219Updated 3 years ago
yangbang18 / Non-Autoregressive-Video-Captioning
The PyTorch code of the AAAI2021 paper "Non-Autoregressive Coarse-to-Fine Video Captioning".
☆58Updated 2 years ago
entalent / MemCap
code for paper `MemCap: Memorizing Style Knowledge for Image Captioning`
☆11Updated 5 years ago
LgQu / DIME
Dynamic Modality Interaction Modeling for Image-Text Retrieval. SIGIR'21
☆71Updated 3 years ago
CrossmodalGroup / GSMN
Implementation of our CVPR2020 paper, Graph Structured Network for Image-Text Matching
☆169Updated 5 years ago
violetteshev / bottom-up-features
Bottom-up features extractor implemented in PyTorch.
☆72Updated 6 years ago
facebookresearch / grid-feats-vqa
Grid features pre-training code for visual question answering
☆269Updated 4 years ago
woodfrog / vse_infty
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)
☆162Updated 3 months ago
aioz-ai / ICCV19_VQA-CTI
Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)
☆38Updated 3 years ago
guanghuixu / AnchorCaptioner
☆31Updated 4 years ago
jacobswan1 / ViTCAP
Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".
☆43Updated 3 years ago
shilrley6 / Faster-R-CNN-with-model-pretrained-on-Visual-Genome
Faster RCNN model in Pytorch version, pretrained on the Visual Genome with ResNet 101
☆239Updated 3 years ago
LgQu / CAMERA
Context-Aware Multi-View Summarization Network for Image-Text Matching. ACM MM'20
☆30Updated 3 years ago
mesnico / TERAN
Code and Resources for the Transformer Encoder Reasoning and Alignment Network (TERAN), accepted for publication in ACM Transactions on M…
☆74Updated 2 years ago