bytedance/MTVQA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bytedance/MTVQA)

bytedance / MTVQA

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine widely-used yet low-resource languages.

☆64

Alternatives and similar repositories for MTVQA

Users that are interested in MTVQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bytedance / E2STR
View on GitHub
The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
☆55Jun 14, 2024Updated 2 years ago
sakura2233565548 / TabPedia
View on GitHub
This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
☆51Oct 16, 2024Updated last year
mxin262 / ESTextSpotter
View on GitHub
(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
☆78Apr 9, 2024Updated 2 years ago
Yuliang-Liu / MultimodalOCR
View on GitHub
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
☆873Updated this week
LayTextLLM / LayTextLLM
View on GitHub
☆103Dec 23, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
bino282 / ViNLP
View on GitHub
☆17Oct 30, 2022Updated 3 years ago
zzyhlyoko / DCTC
View on GitHub
☆42Sep 2, 2023Updated 2 years ago
joanrod / ocr-vqgan
View on GitHub
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…
☆84Jan 30, 2023Updated 3 years ago
MMStar-Benchmark / MMStar
View on GitHub
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆215Sep 26, 2024Updated last year
Actasidiot / EFIFSTR
View on GitHub
[ACM MM 2020] Exploring Font-independent Features for Scene Text Recognition
☆44Nov 30, 2020Updated 5 years ago
tianyu-z / VCR
View on GitHub
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
☆32Feb 26, 2025Updated last year
bytedance / SPTSv2
View on GitHub
The official implementation of SPTS v2: Single-Point Text Spotting
☆138Jun 29, 2023Updated 3 years ago
GeorgeLuImmortal / PaDeLLM_NER
View on GitHub
☆11Nov 21, 2024Updated last year
core-mm / core-mm
View on GitHub
☆17Feb 22, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
rubenpt91 / MP-DocVQA-Framework
View on GitHub
☆72Jan 9, 2024Updated 2 years ago
jfkuang / CFAM
View on GitHub
Contrast-guided Feature Adjustment Module for Visual Information Extraction
☆30May 23, 2023Updated 3 years ago
wenwenyu / TCM
View on GitHub
Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)
☆202Jun 17, 2024Updated 2 years ago
NormXU / Layout2Graph
View on GitHub
An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"
☆82Oct 14, 2023Updated 2 years ago
CyrilSterling / LPV
View on GitHub
The official code of Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition (IJCAI2023)
☆26Sep 3, 2023Updated 2 years ago
kenchan0226 / FineGrainedFact
View on GitHub
Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarizatio…
☆15Jan 25, 2024Updated 2 years ago
Xiaomeng-Yang / STR_benchmark_cleansed
View on GitHub
☆14May 26, 2023Updated 3 years ago
KunyuLin / XOV-Action
View on GitHub
The first work for cross-domain open-vocabulary action recognition with a benchmark
☆21Jul 9, 2026Updated 2 weeks ago
MMMU-Benchmark / MMMU
View on GitHub
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…
☆590Feb 12, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
apple / ml-mia-bench
View on GitHub
This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
☆38Mar 9, 2025Updated last year
xmu-xiaoma666 / Multimodal-Open-O1
View on GitHub
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…
☆28Sep 25, 2024Updated last year
georgepar / grnet_guide
View on GitHub
Guide for the slp group on how to use the Grnet cluster
☆11Apr 16, 2020Updated 6 years ago
mxin262 / Bridging-Text-Spotting
View on GitHub
(CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.
☆75Jun 11, 2024Updated 2 years ago
google-research-datasets / hiertext
View on GitHub
The HierText dataset contains ~12k images from the Open Images dataset v6 with large amount of text entities. We provide word, line and p…
☆316Dec 2, 2024Updated last year
gregor-ge / mBLIP
View on GitHub
☆88Jan 10, 2024Updated 2 years ago
open-compass / MMBench
View on GitHub
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
☆307May 22, 2025Updated last year
bajibabu / GlottGAN
View on GitHub
This repository contains the files used for our Interspeech 2017 paper.
☆16May 30, 2017Updated 9 years ago
FuxiaoLiu / MMC
View on GitHub
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
☆95Jan 7, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ku21fan / STR-Fewer-Labels
View on GitHub
Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)
☆185Dec 23, 2023Updated 2 years ago
qhnhynmm / ViOCRVQA-Dataset
View on GitHub
The largest VQA dataset for Vietnamese. Related to the text content in the image.
☆19Apr 9, 2025Updated last year
openaudiolab / LLaST
View on GitHub
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
☆26Aug 11, 2024Updated last year
nttmdlab-nlp / SlideVQA
View on GitHub
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
☆106Mar 31, 2025Updated last year
michelecafagna26 / cider
View on GitHub
Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming …
☆13Dec 4, 2025Updated 7 months ago
MME-Benchmarks / MME-RealWorld
View on GitHub
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆161Oct 21, 2025Updated 9 months ago
SCNU203 / GeoQA-Plus
View on GitHub
☆20May 14, 2024Updated 2 years ago