MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine widely-used yet low-resource languages.
☆64May 15, 2025Updated 11 months ago
Alternatives and similar repositories for MTVQA
Users that are interested in MTVQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer☆55Jun 14, 2024Updated last year
- This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy☆51Oct 16, 2024Updated last year
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆817Apr 9, 2026Updated last week
- (ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer☆78Apr 9, 2024Updated 2 years ago
- ☆18Jun 12, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆17Oct 30, 2022Updated 3 years ago
- ☆102Dec 23, 2024Updated last year
- ☆42Sep 2, 2023Updated 2 years ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆83Jan 30, 2023Updated 3 years ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆206Sep 26, 2024Updated last year
- The official implementation of SPTS v2: Single-Point Text Spotting☆140Jun 29, 2023Updated 2 years ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- [ACM MM 2020] Exploring Font-independent Features for Scene Text Recognition☆44Nov 30, 2020Updated 5 years ago
- ☆17Feb 22, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“☆73May 19, 2025Updated 10 months ago
- ☆70Jan 9, 2024Updated 2 years ago
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆30May 23, 2023Updated 2 years ago
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆201Jun 17, 2024Updated last year
- Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarizatio…☆14Jan 25, 2024Updated 2 years ago
- An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"☆81Oct 14, 2023Updated 2 years ago
- ☆36Oct 7, 2023Updated 2 years ago
- ☆14May 26, 2023Updated 2 years ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Sep 25, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Searching a High Performance Feature Extractor for Text Recognition Network. TPAMI 2022☆13Nov 25, 2022Updated 3 years ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆133May 16, 2025Updated 11 months ago
- (CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.☆74Jun 11, 2024Updated last year
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆38Mar 9, 2025Updated last year
- Data Programming for Text Detection in Documents using SPEAR☆12Mar 26, 2025Updated last year
- A lightweight framework for evaluating visual-language models.☆41Apr 9, 2026Updated last week
- The HierText dataset contains ~12k images from the Open Images dataset v6 with large amount of text entities. We provide word, line and p…☆307Dec 2, 2024Updated last year
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆95Jan 7, 2025Updated last year
- Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)☆184Dec 23, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆296May 22, 2025Updated 10 months ago
- ☆88Jan 10, 2024Updated 2 years ago
- This repository contains the files used for our Interspeech 2017 paper.☆16May 30, 2017Updated 8 years ago
- SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)☆105Mar 31, 2025Updated last year
- ☆48Sep 5, 2024Updated last year
- The largest VQA dataset for Vietnamese. Related to the text content in the image.☆19Apr 9, 2025Updated last year
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆27Aug 11, 2024Updated last year