MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine widely-used yet low-resource languages.
☆64May 15, 2025Updated last year
Alternatives and similar repositories for MTVQA
Users that are interested in MTVQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer☆55Jun 14, 2024Updated 2 years ago
- This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy☆51Oct 16, 2024Updated last year
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆850Updated this week
- (ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer☆78Apr 9, 2024Updated 2 years ago
- ☆18Jun 12, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆17Oct 30, 2022Updated 3 years ago
- ☆101Dec 23, 2024Updated last year
- ☆42Sep 2, 2023Updated 2 years ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆84Jan 30, 2023Updated 3 years ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆214Sep 26, 2024Updated last year
- The official implementation of SPTS v2: Single-Point Text Spotting☆138Jun 29, 2023Updated 2 years ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- ☆11Nov 21, 2024Updated last year
- ☆17Feb 22, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“☆74May 19, 2025Updated last year
- ☆70Jan 9, 2024Updated 2 years ago
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆30May 23, 2023Updated 3 years ago
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆201Jun 17, 2024Updated last year
- Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarizatio…☆14Jan 25, 2024Updated 2 years ago
- An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"☆81Oct 14, 2023Updated 2 years ago
- The official code of Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition (IJCAI2023)☆26Sep 3, 2023Updated 2 years ago
- ☆38Oct 7, 2023Updated 2 years ago
- ☆14May 26, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Searching a High Performance Feature Extractor for Text Recognition Network. TPAMI 2022☆13Nov 25, 2022Updated 3 years ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆138May 16, 2025Updated last year
- (CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.☆75Jun 11, 2024Updated 2 years ago
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆39Mar 9, 2025Updated last year
- Data Programming for Text Detection in Documents using SPEAR☆12Mar 26, 2025Updated last year
- A lightweight framework for evaluating visual-language models.☆41Apr 20, 2026Updated last month
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- The HierText dataset contains ~12k images from the Open Images dataset v6 with large amount of text entities. We provide word, line and p…☆312Dec 2, 2024Updated last year
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆4,218Updated this week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆95Jan 7, 2025Updated last year
- Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)☆184Dec 23, 2023Updated 2 years ago
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆303May 22, 2025Updated last year
- ☆88Jan 10, 2024Updated 2 years ago
- This repository contains the files used for our Interspeech 2017 paper.☆16May 30, 2017Updated 9 years ago
- ☆48Sep 5, 2024Updated last year
- SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)☆106Mar 31, 2025Updated last year