xinke-wang / Awesome-Text-VQALinks

☆187

Alternatives and similar repositories for Awesome-Text-VQA

Users that are interested in Awesome-Text-VQA are comparing it to the libraries listed below

Sorting:

microsoft / TAP
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
☆72Updated 2 years ago
ronghanghu / mmf
A modular framework for Visual Question Answering research by the FAIR A-STAR team
☆45Updated 3 years ago
ZephyrZhuQi / ssbaseline
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
☆57Updated 3 years ago
yashkant / sam-textvqa
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
☆64Updated 3 years ago
ChenyuGAO-CS / SMA
The imdb files with SBD-Trans OCR for TextVQA dataset.
☆11Updated 3 years ago
uakarsh / latr
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…
☆53Updated 9 months ago
researchmm / soho
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
☆208Updated 2 years ago
shilrley6 / Faster-R-CNN-with-model-pretrained-on-Visual-Genome
Faster RCNN model in Pytorch version, pretrained on the Visual Genome with ResNet 101
☆237Updated 2 years ago
zdou0830 / METER
METER: A Multimodal End-to-end TransformER Framework
☆373Updated 2 years ago
uta-smile / TCL
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
☆265Updated 10 months ago
BryanPlummer / flickr30k_entities
Flickr30K Entities Dataset
☆177Updated 6 years ago
zengyan-97 / CCLM
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
☆91Updated 2 years ago
facebookresearch / grid-feats-vqa
Grid features pre-training code for visual question answering
☆269Updated 3 years ago
ronghanghu / vqa-maskrcnn-benchmark-m4c
Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…
☆13Updated 5 years ago
li-xirong / coco-cn
Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks
☆201Updated 5 months ago
zyang-ur / onestage_grounding
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)
☆148Updated 4 years ago
linjieli222 / HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆233Updated 3 years ago
wzk1015 / CNMT
[AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
☆24Updated 2 years ago
zhanxlin / Product1M
Product1M
☆87Updated 2 years ago
pzzhang / VinVL
project page for VinVL
☆356Updated 2 years ago
igorbrigadir / DownloadConceptualCaptions
Reliably download millions of images efficiently
☆116Updated 4 years ago
HAWLYQ / Qc-TextCap
☆16Updated 3 years ago
MILVLG / bottom-up-attention.pytorch
A PyTorch reimplementation of bottom-up-attention models
☆302Updated 3 years ago
clip-vil / CLIP-ViL
[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383
☆416Updated 2 years ago
forence / Awesome-Visual-Captioning
This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP
☆412Updated 2 years ago
UKPLab / MMT-Retrieval
☆131Updated 2 years ago
AndresPMD / StacMR
Scene Text Aware Cross Modal Retrieval (StacMR)
☆24Updated 3 years ago
xiaojino / RUArt
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
☆10Updated 2 years ago
phellonchen / awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
☆292Updated 2 years ago
furkanbiten / stvqa_amazon_ocr
STVQA and TextVQA OCR results from Amazon Text in Image pipeline
☆11Updated 3 years ago