MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine widely-used yet low-resource languages.
☆64May 15, 2025Updated 10 months ago
Alternatives and similar repositories for MTVQA
Users that are interested in MTVQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer☆55Jun 14, 2024Updated last year
- This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy☆50Oct 16, 2024Updated last year
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆802Jul 5, 2025Updated 8 months ago
- ☆18Jun 12, 2024Updated last year
- ☆17Oct 30, 2022Updated 3 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆42Sep 2, 2023Updated 2 years ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆83Jan 30, 2023Updated 3 years ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆205Sep 26, 2024Updated last year
- The official implementation of SPTS v2: Single-Point Text Spotting☆140Jun 29, 2023Updated 2 years ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- ☆11Nov 21, 2024Updated last year
- [ACM MM 2020] Exploring Font-independent Features for Scene Text Recognition☆44Nov 30, 2020Updated 5 years ago
- ☆17Feb 22, 2024Updated 2 years ago
- The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“☆73May 19, 2025Updated 10 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆69Jan 9, 2024Updated 2 years ago
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆30May 23, 2023Updated 2 years ago
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆201Jun 17, 2024Updated last year
- The official code of Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition (IJCAI2023)☆27Sep 3, 2023Updated 2 years ago
- Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarizatio…☆14Jan 25, 2024Updated 2 years ago
- An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"☆81Oct 14, 2023Updated 2 years ago
- ☆14May 26, 2023Updated 2 years ago
- ☆36Oct 7, 2023Updated 2 years ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo …☆29Sep 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- The first work for cross-domain open-vocabulary action recognition with a benchmark☆20May 27, 2024Updated last year
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆131May 16, 2025Updated 10 months ago
- (CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.☆74Jun 11, 2024Updated last year
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆38Mar 9, 2025Updated last year
- Data Programming for Text Detection in Documents using SPEAR☆12Mar 26, 2025Updated last year
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆95Jan 7, 2025Updated last year
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆292May 22, 2025Updated 10 months ago
- ☆88Jan 10, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This repository contains the files used for our Interspeech 2017 paper.☆16May 30, 2017Updated 8 years ago
- SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)☆105Mar 31, 2025Updated 11 months ago
- ☆48Sep 5, 2024Updated last year
- The largest VQA dataset for Vietnamese. Related to the text content in the image.☆19Apr 9, 2025Updated 11 months ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆3,958Updated this week
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆130Sep 28, 2025Updated 5 months ago
- ☆16Oct 21, 2024Updated last year