ronghanghu / vqa-maskrcnn-benchmark-m4cLinks

Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_feature.py

☆13

Alternatives and similar repositories for vqa-maskrcnn-benchmark-m4c

Users that are interested in vqa-maskrcnn-benchmark-m4c are comparing it to the libraries listed below

Sorting:

xinke-wang / Awesome-Text-VQA
☆188Updated last year
ZephyrZhuQi / ssbaseline
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
☆57Updated 3 years ago
microsoft / TAP
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
☆73Updated 2 years ago
ronghanghu / mmf
A modular framework for Visual Question Answering research by the FAIR A-STAR team
☆45Updated 4 years ago
HAWLYQ / Qc-TextCap
☆16Updated 3 years ago
ChenyuGAO-CS / SMA
The imdb files with SBD-Trans OCR for TextVQA dataset.
☆11Updated 3 years ago
xiaojino / RUArt
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
☆10Updated 2 years ago
furkanbiten / stvqa_amazon_ocr
STVQA and TextVQA OCR results from Amazon Text in Image pipeline
☆11Updated 3 years ago
uakarsh / latr
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…
☆55Updated last year
yashkant / sam-textvqa
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
☆65Updated 4 years ago
MILVLG / bottom-up-attention.pytorch
A PyTorch reimplementation of bottom-up-attention models
☆304Updated 3 years ago
HenryJunW / TAG
☆22Updated 2 years ago
ruotianluo / coco-caption
☆68Updated 3 years ago
ezeli / BUTD_model
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
☆47Updated 4 years ago
nttmdlab-nlp / VisualMRC
VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)
☆56Updated 7 months ago
airsplay / py-bottom-up-attention
PyTorch bottom-up attention with Detectron2
☆236Updated 3 years ago
alirezasalemi7 / DEDR-MM-FiD
the code for paper: A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering
☆12Updated 2 years ago
terry-r123 / Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
☆112Updated 3 years ago
shilrley6 / Faster-R-CNN-with-model-pretrained-on-Visual-Genome
Faster RCNN model in Pytorch version, pretrained on the Visual Genome with ResNet 101
☆239Updated 3 years ago
facebookresearch / grid-feats-vqa
Grid features pre-training code for visual question answering
☆269Updated 4 years ago
hwanheelee1993 / ViLBERTScore
Code for ViLBERTScore in EMNLP Eval4NLP
☆18Updated 3 years ago
CrossmodalGroup / SSL-VQA
Code for our IJCAI2020 paper: Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
☆52Updated 5 years ago
pzzhang / VinVL
project page for VinVL
☆359Updated 2 years ago
SjokerLily / awesome-image-captioning
A paper list of image captioning.
☆22Updated 3 years ago
AndresPMD / StacMR
Scene Text Aware Cross Modal Retrieval (StacMR)
☆24Updated 4 years ago
BierOne / bottom-up-attention-vqa
An updated PyTorch implementation of hengyuan-hu's version for 'Bottom-Up and Top-Down Attention for Image Captioning and Visual Question…
☆35Updated 3 years ago
ThalesGroup / ConceptBERT
Implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering
☆31Updated last year
ezeli / Transformer_model
A pytorch implementation of Attention Is All You Need (Transformer) for image captioning.
☆12Updated 4 years ago
yikuan8 / Transformers-VQA
An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER
☆165Updated 2 years ago
zhangxuying1004 / RSTNet
Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)
☆123Updated 2 years ago