rubenpt91/MP-DocVQA-Framework

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rubenpt91/MP-DocVQA-Framework)

rubenpt91 / MP-DocVQA-Framework

☆72

Alternatives and similar repositories for MP-DocVQA-Framework

Users that are interested in MP-DocVQA-Framework are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

uakarsh / TiLT-Implementation
View on GitHub
Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.
☆18Apr 23, 2023Updated 3 years ago
adlnlp / pdfvqa
View on GitHub
☆18Jun 12, 2024Updated 2 years ago
DS3Lab / WordScape
View on GitHub
The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.
☆42Dec 7, 2023Updated 2 years ago
WenjinW / LATIN-Prompt
View on GitHub
☆52May 28, 2024Updated 2 years ago
NExTplusplus / TAT-DQA
View on GitHub
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
☆24Sep 17, 2024Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
applicaai / kleister-charity
View on GitHub
☆40Aug 18, 2021Updated 4 years ago
furkanbiten / idl_data
View on GitHub
OCR Annotations from Amazon Textract for Industry Documents Library
☆103Aug 20, 2022Updated 3 years ago
nttmdlab-nlp / VisualMRC
View on GitHub
VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)
☆57Mar 31, 2025Updated last year
nttmdlab-nlp / SlideVQA
View on GitHub
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
☆106Mar 31, 2025Updated last year
ZZR8066 / GraphDoc
View on GitHub
☆45Jul 18, 2022Updated 4 years ago
gsoykan / comics_text_plus
View on GitHub
Official repository of the paper: "A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition"
☆26Jul 10, 2023Updated 3 years ago
anisha2102 / docvqa
View on GitHub
Document Visual Question Answering
☆130Jul 30, 2020Updated 5 years ago
rossumai / docile
View on GitHub
DocILE: Document Information Localization and Extraction Benchmark
☆149Jun 17, 2026Updated last month
bytedance / MTVQA
View on GitHub
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…
☆64May 15, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
naver-ai / cream
View on GitHub
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023
☆46Jun 11, 2024Updated 2 years ago
jfma-USTC / HRDoc
View on GitHub
Dataset and scripts for HRDoc
☆42Jun 21, 2023Updated 3 years ago
DataArcTech / ChartBench
View on GitHub
☆16May 15, 2025Updated last year
InternScience / SimChart9K
View on GitHub
The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.
☆26Feb 22, 2024Updated 2 years ago
HenryJunW / TAG
View on GitHub
☆22Dec 8, 2022Updated 3 years ago
LukeForeverYoung / UReader
View on GitHub
☆142Feb 13, 2024Updated 2 years ago
vis-nlp / ChartQA
View on GitHub
☆260Apr 18, 2025Updated last year
allanj / LayoutLMv3-DocVQA
View on GitHub
Example codebase for fine-tuning layoutLMv3 on DocVQA
☆53Sep 19, 2022Updated 3 years ago
bytedance / SPTSv2
View on GitHub
The official implementation of SPTS v2: Single-Point Text Spotting
☆138Jun 29, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
due-benchmark / baselines
View on GitHub
The code related to the baselines from NeurIPS 2021 paper "DUE: End-to-End Document Understanding Benchmark."
☆36Mar 2, 2023Updated 3 years ago
IBM / KVP10k
View on GitHub
Repository for the KVP10k dataset
☆23Sep 18, 2025Updated 10 months ago
nttmdlab-nlp / InstructDoc
View on GitHub
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
☆162May 31, 2024Updated 2 years ago
mayubo2333 / MMLongBench-Doc
View on GitHub
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
☆149Sep 28, 2025Updated 9 months ago
shirlyliu64 / ConvBench
View on GitHub
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
☆16Sep 27, 2024Updated last year
Xiaomeng-Yang / STR_benchmark_cleansed
View on GitHub
☆14May 26, 2023Updated 3 years ago
clovaai / bros
View on GitHub
☆163Dec 27, 2022Updated 3 years ago
IITB-LEAP-OCR / TEXTRON
View on GitHub
Data Programming for Text Detection in Documents using SPEAR
☆12Mar 26, 2025Updated last year
LARS-research / TREFE
View on GitHub
Searching a High Performance Feature Extractor for Text Recognition Network. TPAMI 2022
☆13Nov 25, 2022Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
xinke-wang / Awesome-Text-VQA
View on GitHub
☆188May 8, 2024Updated 2 years ago
hesedjds / SQUAT
View on GitHub
The official code for Devil's on the Edges: Selective Quad Attention for Scene Graph Generation, CVPR2023.
☆25Jul 17, 2023Updated 3 years ago
amazon-science / glass-text-spotting
View on GitHub
Official implementation for "GLASS: Global to Local Attention for Scene-Text Spotting" (ECCV'22)
☆102Jun 28, 2024Updated 2 years ago
applicaai / kleister-nda
View on GitHub
☆61Aug 18, 2021Updated 4 years ago
phucty / wtabhtml
View on GitHub
Tool to parse wiki tables from the HTML dump of Wikipedia
☆11Jun 12, 2022Updated 4 years ago
uakarsh / latr
View on GitHub
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…
☆56Updated this week