bytedance / VTVQALinks
Towards Video Text Visual Question Answering: Benchmark and Baseline
☆38Updated last year
Alternatives and similar repositories for VTVQA
Users that are interested in VTVQA are comparing it to the libraries listed below
Sorting:
- UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)☆89Updated 2 years ago
- [SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval.☆133Updated 3 years ago
- A Unified Framework for Video-Language Understanding☆60Updated 2 years ago
- ☆110Updated 2 years ago
- Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…