MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine widely-used yet low-resource languages.
☆64May 15, 2025Updated last year
Alternatives and similar repositories for MTVQA
Users that are interested in MTVQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆840May 20, 2026Updated last week
- (ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer☆78Apr 9, 2024Updated 2 years ago
- ☆17Oct 30, 2022Updated 3 years ago
- ☆101Dec 23, 2024Updated last year
- ☆42Sep 2, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆83Jan 30, 2023Updated 3 years ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆211Sep 26, 2024Updated last year
- The official implementation of SPTS v2: Single-Point Text Spotting☆138Jun 29, 2023Updated 2 years ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- [ACM MM 2020] Exploring Font-independent Features for Scene Text Recognition☆44Nov 30, 2020Updated 5 years ago
- ☆17Feb 22, 2024Updated 2 years ago
- ☆12Jun 29, 2024Updated last year
- ☆70Jan 9, 2024Updated 2 years ago
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆30May 23, 2023Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆201Jun 17, 2024Updated last year
- Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarizatio…☆14Jan 25, 2024Updated 2 years ago
- An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"☆81Oct 14, 2023Updated 2 years ago
- The official code of Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition (IJCAI2023)☆27Sep 3, 2023Updated 2 years ago
- ☆38Oct 7, 2023Updated 2 years ago
- ☆14May 26, 2023Updated 3 years ago
- Searching a High Performance Feature Extractor for Text Recognition Network. TPAMI 2022☆13Nov 25, 2022Updated 3 years ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆137May 16, 2025Updated last year
- (CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.☆75Jun 11, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆38Mar 9, 2025Updated last year
- Data Programming for Text Detection in Documents using SPEAR☆12Mar 26, 2025Updated last year
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- The HierText dataset contains ~12k images from the Open Images dataset v6 with large amount of text entities. We provide word, line and p…☆312Dec 2, 2024Updated last year
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆95Jan 7, 2025Updated last year
- Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)☆184Dec 23, 2023Updated 2 years ago
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆300May 22, 2025Updated last year
- ☆88Jan 10, 2024Updated 2 years ago
- This repository contains the files used for our Interspeech 2017 paper.☆16May 30, 2017Updated 8 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆48Sep 5, 2024Updated last year
- SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)☆106Mar 31, 2025Updated last year
- The largest VQA dataset for Vietnamese. Related to the text content in the image.☆19Apr 9, 2025Updated last year
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆26Aug 11, 2024Updated last year
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆4,145May 15, 2026Updated last week
- ☆16Oct 21, 2024Updated last year
- ☆142Feb 13, 2024Updated 2 years ago