microsoft/TAP

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/TAP)

microsoft / TAP

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

☆72

Alternatives and similar repositories for TAP

Users that are interested in TAP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

uakarsh / latr
View on GitHub
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…
☆56Oct 30, 2024Updated last year
ChenyuGAO-CS / SMA
View on GitHub
The imdb files with SBD-Trans OCR for TextVQA dataset.
☆11Nov 30, 2021Updated 4 years ago
yashkant / sam-textvqa
View on GitHub
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
☆65Sep 15, 2021Updated 4 years ago
guanghuixu / CRN_tvqa
View on GitHub
☆15Oct 27, 2020Updated 5 years ago
xinke-wang / Awesome-Text-VQA
View on GitHub
☆188May 8, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ZephyrZhuQi / ssbaseline
View on GitHub
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
☆57Apr 5, 2022Updated 4 years ago
xiaojino / RUArt
View on GitHub
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
☆10Nov 27, 2022Updated 3 years ago
ronghanghu / mmf
View on GitHub
A modular framework for Visual Question Answering research by the FAIR A-STAR team
☆45Aug 26, 2021Updated 4 years ago
HenryJunW / TAG
View on GitHub
☆22Dec 8, 2022Updated 3 years ago
furkanbiten / stvqa_amazon_ocr
View on GitHub
STVQA and TextVQA OCR results from Amazon Text in Image pipeline
☆12Jul 18, 2022Updated 4 years ago
wzk1015 / CNMT
View on GitHub
[AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
☆24Mar 29, 2023Updated 3 years ago
bytedance / VTVQA
View on GitHub
Towards Video Text Visual Question Answering: Benchmark and Baseline
☆41Feb 26, 2024Updated 2 years ago
csguoh / KD-LTR
View on GitHub
[MM2023] An official implement of the paper "One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer"
☆16Nov 3, 2023Updated 2 years ago
guanghuixu / AnchorCaptioner
View on GitHub
☆30May 7, 2021Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ronghanghu / vqa-maskrcnn-benchmark-m4c
View on GitHub
Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…
☆13Jan 30, 2020Updated 6 years ago
microsoft / UniTAB
View on GitHub
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆90Jun 12, 2023Updated 3 years ago
microsoft / GenerativeImage2Text
View on GitHub
GIT: A Generative Image-to-text Transformer for Vision and Language
☆582Dec 2, 2023Updated 2 years ago
zhaominyiz / EPiDA
View on GitHub
Official Code for 'EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification' - NAACL 2022
☆23May 9, 2022Updated 4 years ago
MCLAB-OCR / KnowledgeMiningWithSceneText
View on GitHub
☆38Feb 4, 2023Updated 3 years ago
pzzhang / VinVL
View on GitHub
project page for VinVL
☆360Jul 26, 2023Updated 2 years ago
zhaominyiz / STIRER
View on GitHub
STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition -- ACMMM 2023
☆14Dec 2, 2024Updated last year
furkanbiten / idl_data
View on GitHub
OCR Annotations from Amazon Textract for Industry Documents Library
☆103Aug 20, 2022Updated 3 years ago
microsoft / vision-datasets
View on GitHub
☆19Mar 24, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
amazon-science / textadain-robust-recognition
View on GitHub
TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers
☆21Jul 26, 2022Updated 3 years ago
microsoft / act
View on GitHub
AML Command Transfer. A lightweight tool to transfer any command line to Azure Machine Learning Services
☆20May 23, 2024Updated 2 years ago
Canjie-Luo / Real-300K
View on GitHub
The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Norm…
☆34Jun 21, 2022Updated 4 years ago
BordiaS / layoutlm
View on GitHub
☆97Jul 13, 2020Updated 6 years ago
facebookresearch / grid-feats-vqa
View on GitHub
Grid features pre-training code for visual question answering
☆269Sep 17, 2021Updated 4 years ago
amazon-science / glass-text-spotting
View on GitHub
Official implementation for "GLASS: Global to Local Attention for Scene-Text Spotting" (ECCV'22)
☆102Jun 28, 2024Updated 2 years ago
microsoft / hnms
View on GitHub
Hashing-based Non-Maximum Suppression
☆29Sep 16, 2020Updated 5 years ago
TencentARC / BTS
View on GitHub
BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild
☆33Apr 16, 2024Updated 2 years ago
nttmdlab-nlp / VisualMRC
View on GitHub
VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)
☆57Mar 31, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ThunderVVV / RCLSTR
View on GitHub
Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`
☆17Sep 22, 2023Updated 2 years ago
clin1223 / MTVM
View on GitHub
[ECCV 2022] Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation
☆19Jul 18, 2022Updated 4 years ago
bearcatt / LaBERT
View on GitHub
A length-controllable and non-autoregressive image captioning model.
☆69Jun 10, 2021Updated 5 years ago
shengtao96 / CentripetalText
View on GitHub
☆29Aug 31, 2022Updated 3 years ago
rentainhe / TRAR-VQA
View on GitHub
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
☆68Oct 11, 2021Updated 4 years ago
Actasidiot / EFIFSTR
View on GitHub
[ACM MM 2020] Exploring Font-independent Features for Scene Text Recognition
☆44Nov 30, 2020Updated 5 years ago
SHI-Labs / Rethinking-Text-Segmentation
View on GitHub
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
☆275Dec 2, 2023Updated 2 years ago