bytedance/VTVQA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bytedance/VTVQA)

bytedance / VTVQA

Towards Video Text Visual Question Answering: Benchmark and Baseline

☆40

Alternatives and similar repositories for VTVQA

Users that are interested in VTVQA are comparing it to the libraries listed below

Sorting:

weijiawu / TransDETR
View on GitHub
[IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer
☆107Mar 28, 2024Updated last year
microsoft / TAP
View on GitHub
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
☆72May 22, 2023Updated 2 years ago
uakarsh / latr
View on GitHub
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…
☆56Oct 30, 2024Updated last year
zhaominyiz / STIRER
View on GitHub
STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition -- ACMMM 2023
☆14Dec 2, 2024Updated last year
csguoh / KD-LTR
View on GitHub
[MM2023] An official implement of the paper "One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer"
☆16Nov 3, 2023Updated 2 years ago
weijiawu / TransVTSpotter
View on GitHub
A new video text spotting framework with Transformer
☆78May 23, 2022Updated 3 years ago
Canjie-Luo / Real-300K
View on GitHub
The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Norm…
☆34Jun 21, 2022Updated 3 years ago
weijiawu / BOVText-Benchmark
View on GitHub
[NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting
☆68Oct 9, 2023Updated 2 years ago
husterpzh / PSSR
View on GitHub
Official code for the paper: "Perception and Semantic Aware Regularization for Sequential Confidence Calibration （CVPR2023）"
☆10May 15, 2024Updated last year
yashkant / sam-textvqa
View on GitHub
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
☆65Sep 15, 2021Updated 4 years ago
LegalDocumentProcessing / FIR_Dataset_ICDAR2023
View on GitHub
☆12Jun 11, 2023Updated 2 years ago
callsys / TextVR
View on GitHub
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆28Dec 28, 2023Updated 2 years ago
xinke-wang / Awesome-Text-VQA
View on GitHub
☆188May 8, 2024Updated last year
SooLab / REP-ERU
View on GitHub
[ECCV2022] A PyTorch implementation of the paper "Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embo…
☆13Mar 20, 2023Updated 2 years ago
ChenyuGAO-CS / SMA
View on GitHub
The imdb files with SBD-Trans OCR for TextVQA dataset.
☆11Nov 30, 2021Updated 4 years ago
Caiyuan-Zheng / Consistency_Regularization_STR
View on GitHub
It's the code for the paper Pushing the Performance Limit of Scene Text Recognizer without Human Annotation, CVPR 2022.
☆28Jul 6, 2022Updated 3 years ago
zhaominyiz / C3-STISR
View on GitHub
Official Code for 'C3-STISR: Scene Text Image Super-resolution with Triple Clues' - IJCAI 2022
☆63Nov 20, 2022Updated 3 years ago
PhoebusSi / SAR
View on GitHub
Code for our ACL2021 paper: "Check It Again: Progressive Visual Question Answering via Visual Entailment"
☆31Nov 24, 2021Updated 4 years ago
ThunderVVV / RCLSTR
View on GitHub
Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`
☆17Sep 22, 2023Updated 2 years ago
ZephyrZhuQi / ssbaseline
View on GitHub
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
☆57Apr 5, 2022Updated 3 years ago
mjq11302010044 / RRPN_plusplus
View on GitHub
RRPN++: Guidance Towards More Accurate Scene Text Detection
☆90Dec 3, 2021Updated 4 years ago
zhousheng97 / ViTXT-GQA
View on GitHub
[IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering
☆16Feb 16, 2026Updated 3 weeks ago
zhaominyiz / RFDA-PyTorch
View on GitHub
Official Code for 'Recursive Fusion and Deformable Spatiotemporal Attention for Video Compression Artifact Reduction' - ACM Multimedia202…
☆59Aug 5, 2022Updated 3 years ago
amazon-science / textadain-robust-recognition
View on GitHub
TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers
☆21Jul 26, 2022Updated 3 years ago
jesbu1 / carl
View on GitHub
Github Repo for CARL: Cautious Adaptation for RL in Safety Critical Settings
☆14Nov 22, 2022Updated 3 years ago
weijiawu / SyntoReal_STD
View on GitHub
HHH
☆36May 2, 2022Updated 3 years ago
Yuliang-Liu / VimTS
View on GitHub
VimTS: A Unified Video and Image Text Spotter
☆78Nov 10, 2024Updated last year
wanghaisheng / ocr-arxiv-daily
View on GitHub
☆18Jun 7, 2023Updated 2 years ago
AndresPMD / Pytorch-yolo-phoc
View on GitHub
Implementation on pytorch of the code from the ECCV 2018 paper - Single Shot Scene Text Retrieval
☆13Dec 15, 2021Updated 4 years ago
DCGM / SoftCTC
View on GitHub
This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135
☆19Mar 7, 2023Updated 3 years ago
MAEHCM / AET
View on GitHub
Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”
☆18Dec 6, 2022Updated 3 years ago
simplify23 / TPS_PP
View on GitHub
Official Pytorch implementations of TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition (IJCAI 2023）
☆44Aug 13, 2023Updated 2 years ago
PkuDavidGuan / CurvedSynthText
View on GitHub
☆41Nov 30, 2019Updated 6 years ago
luyang-NWPU / HGA-STR
View on GitHub
It's the code for <A holistic representation guided attention network for scene text recognition>Neurocomputing 2020
☆17Dec 1, 2020Updated 5 years ago
VITA-Group / layerGraftedPretraining_ICLR23
View on GitHub
[ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…
☆24Feb 16, 2023Updated 3 years ago
showlab / DemoVLP
View on GitHub
[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training
☆22Mar 19, 2022Updated 3 years ago
Hxyz-123 / GoMatching
View on GitHub
[NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
☆28May 29, 2025Updated 9 months ago
SakuraRiven / LANMS
View on GitHub
Locality-Aware Non-Maximum Suppression (C++ version)
☆23Aug 31, 2021Updated 4 years ago
ZZR8066 / GraphDoc
View on GitHub
☆45Jul 18, 2022Updated 3 years ago