uakarsh / latrLinks

Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answering (STVQA)

☆55

Alternatives and similar repositories for latr

Users that are interested in latr are comparing it to the libraries listed below

Sorting:

ZephyrZhuQi / ssbaseline
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
☆57Updated 3 years ago
microsoft / TAP
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
☆73Updated 2 years ago
ChenyuGAO-CS / SMA
The imdb files with SBD-Trans OCR for TextVQA dataset.
☆11Updated 3 years ago
yashkant / sam-textvqa
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
☆65Updated 4 years ago
furkanbiten / stvqa_amazon_ocr
STVQA and TextVQA OCR results from Amazon Text in Image pipeline
☆11Updated 3 years ago
xinke-wang / Awesome-Text-VQA
☆188Updated last year
bytedance / VTVQA
Towards Video Text Visual Question Answering: Benchmark and Baseline
☆38Updated last year
HAWLYQ / Qc-TextCap
☆16Updated 3 years ago
microsoft / UniTAB
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆89Updated 2 years ago
wzk1015 / CNMT
[AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
☆24Updated 2 years ago
AndresPMD / StacMR
Scene Text Aware Cross Modal Retrieval (StacMR)
☆24Updated 4 years ago
xiaojino / RUArt
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
☆10Updated 2 years ago
MCLAB-OCR / KnowledgeMiningWithSceneText
☆38Updated 2 years ago
sushizixin / CLIP4IDC
CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)
☆36Updated 3 years ago
zengyan-97 / CCLM
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
☆92Updated 2 years ago
bearcatt / LaBERT
A length-controllable and non-autoregressive image captioning model.
☆68Updated 4 years ago
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆36Updated last year
ronghanghu / mmf
A modular framework for Visual Question Answering research by the FAIR A-STAR team
☆45Updated 4 years ago
zhangxuying1004 / RSTNet
Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)
☆123Updated 2 years ago
naver-ai / tablevqabench
☆45Updated last year
MAEHCM / ICL-D3IE
Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”
☆54Updated 2 years ago
soloist97 / densecap-pytorch
A simplified pytorch version of densecap
☆42Updated 11 months ago
GT-RIPL / Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆60Updated 3 years ago
antoyang / just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
☆124Updated 2 years ago
Yushi-Hu / PromptCap
natual language guided image captioning
☆86Updated last year
YuanEZhou / CBTrans
☆23Updated 3 years ago
Cuberick-Orion / CIRPLANT
Official implementation of the Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) | ICCV 2021 - Image Retrieval o…
☆40Updated last year
SjokerLily / awesome-image-captioning
A paper list of image captioning.
☆22Updated 3 years ago
weijiawu / TransVTSpotter
A new video text spotting framework with Transformer
☆78Updated 3 years ago
ylsung / VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆207Updated 2 years ago