MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine widely-used yet low-resource languages.
☆63May 15, 2025Updated 9 months ago
Alternatives and similar repositories for MTVQA
Users that are interested in MTVQA are comparing it to the libraries listed below
Sorting:
- ☆17Jun 12, 2024Updated last year
- The largest VQA dataset for Vietnamese. Related to the text content in the image.☆19Apr 9, 2025Updated 10 months ago
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆30May 23, 2023Updated 2 years ago
- ☆69Jan 9, 2024Updated 2 years ago
- The official code of Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition (IJCAI2023)☆27Sep 3, 2023Updated 2 years ago
- ☆17Oct 30, 2022Updated 3 years ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- (ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer☆78Apr 9, 2024Updated last year
- ☆17Feb 22, 2024Updated 2 years ago
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆796Jul 5, 2025Updated 8 months ago
- The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer☆55Jun 14, 2024Updated last year
- ☆42Sep 2, 2023Updated 2 years ago
- Data Programming for Text Detection in Documents using SPEAR☆12Mar 26, 2025Updated 11 months ago
- ☆102Dec 23, 2024Updated last year
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆126Sep 28, 2025Updated 5 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆95Jan 7, 2025Updated last year
- The official implementation of SPTS v2: Single-Point Text Spotting☆140Jun 29, 2023Updated 2 years ago
- This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy☆50Oct 16, 2024Updated last year
- [ACM MM 2020] Exploring Font-independent Features for Scene Text Recognition☆44Nov 30, 2020Updated 5 years ago
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆23Sep 17, 2024Updated last year
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆27Aug 11, 2024Updated last year
- (CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.☆74Jun 11, 2024Updated last year
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆83Jan 30, 2023Updated 3 years ago
- Similarity Learning applied to Speaker Verification and Semantic Textual Similarity☆12Apr 8, 2020Updated 5 years ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- ☆48Sep 5, 2024Updated last year
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆203Sep 26, 2024Updated last year
- ☆36Oct 7, 2023Updated 2 years ago
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆201Jun 17, 2024Updated last year
- ☆16Jan 10, 2025Updated last year
- Searching a High Performance Feature Extractor for Text Recognition Network. TPAMI 2022☆13Nov 25, 2022Updated 3 years ago
- This repository contains the files used for our Interspeech 2017 paper.☆16May 30, 2017Updated 8 years ago
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆37Mar 9, 2025Updated 11 months ago
- Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”☆54Aug 8, 2023Updated 2 years ago
- Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> O…☆33Aug 18, 2021Updated 4 years ago
- Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarizatio…☆14Jan 25, 2024Updated 2 years ago
- ☆14May 26, 2023Updated 2 years ago
- [ICML-2025] We introduce Lie group Relative position Encodings (LieRE) that goes beyond RoPE in supporting n-dimensional inputs.☆30Aug 13, 2025Updated 6 months ago
- ✨ Beautiful OCR Project Team Code by Team DKT☆12Jun 23, 2021Updated 4 years ago