harrytea/Awesome-Document-Understanding

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/harrytea/Awesome-Document-Understanding)

harrytea / Awesome-Document-Understanding

Document Artifical Intelligence

☆201

Alternatives and similar repositories for Awesome-Document-Understanding

Users that are interested in Awesome-Document-Understanding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SCUT-DLVCLab / Document-AI-Recommendations
View on GitHub
Algorithms, papers, datasets, performance comparisons for Document AI.
☆209Mar 1, 2025Updated last year
bzluan / TextCoT
View on GitHub
[ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"
☆45Feb 27, 2026Updated 5 months ago
harrytea / TGDoc
View on GitHub
"Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023
☆16Nov 28, 2024Updated last year
lzk9508 / DaFIR
View on GitHub
The official code for "DaFIR: Distortion-Aware Representation Learning for Fisheye Image Rectification", TCSVT, 2023.
☆13May 30, 2025Updated last year
chongzhangFDU / ROOR
View on GitHub
This is the official implementation to the EMNLP 2024 paper: Modeling Layout Reading Order as Ordering Relations for Visually-rich Docume…
☆32Jan 19, 2026Updated 6 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
fh2019ustc / DeepEraser
View on GitHub
The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.
☆53Aug 26, 2024Updated last year
ucaslcl / Fox
View on GitHub
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
☆197May 31, 2024Updated 2 years ago
khuangaf / Awesome-Chart-Understanding
View on GitHub
A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Auto…
☆241Dec 17, 2025Updated 7 months ago
LukeForeverYoung / UReader
View on GitHub
☆142Feb 13, 2024Updated 2 years ago
yuyq96 / TextHawk
View on GitHub
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆68Nov 1, 2024Updated last year
fh2019ustc / Awesome-Document-Image-Rectification
View on GitHub
A comprehensive list of awesome document image rectification papers.
☆558Apr 15, 2026Updated 3 months ago
fh2019ustc / DocTr
View on GitHub
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
☆436Jul 10, 2026Updated 2 weeks ago
LayTextLLM / LayTextLLM
View on GitHub
☆103Dec 23, 2024Updated last year
Line-Kite / GraphLayoutLM
View on GitHub
☆14Sep 6, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
X-PLUG / mPLUG-DocOwl
View on GitHub
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
☆2,411May 30, 2025Updated last year
Tan-Junwen / awesome-table-structure-recognition
View on GitHub
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating…
☆232Sep 9, 2024Updated last year
tstanislawek / awesome-document-understanding
View on GitHub
A curated list of resources for Document Understanding (DU) topic
☆1,526Jun 2, 2023Updated 3 years ago
AlibabaResearch / AdvancedLiterateMachinery
View on GitHub
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…
☆1,832Mar 17, 2026Updated 4 months ago
fh2019ustc / DocScanner
View on GitHub
The official repo for “DocScanner: Robust Document Image Rectification with Progressive Learning”, IJCV, 2025.
☆338Jun 18, 2025Updated last year
entropy2333 / awesome-key-information-extraction
View on GitHub
A curated list of papers about key information extraction.
☆107Jul 8, 2026Updated 3 weeks ago
SpursGoZmy / Table-LLaVA
View on GitHub
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …
☆227Jun 12, 2025Updated last year
deepopinion / anls_star_metric
View on GitHub
Official implementation of the ANLS* metric
☆25Updated this week
Yuliang-Liu / MultimodalOCR
View on GitHub
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
☆873Jul 22, 2026Updated last week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
HCIILAB / M6Doc
View on GitHub
☆166May 8, 2025Updated last year
cv-small-snails / Awesome-Table-Recognition
View on GitHub
A curated list of resources dedicated to table recognition
☆405Dec 12, 2024Updated last year
FreeOCR-AI / layoutreader
View on GitHub
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
☆323Aug 15, 2025Updated 11 months ago
jfkuang / CFAM
View on GitHub
Contrast-guided Feature Adjustment Module for Visual Information Extraction
☆30May 23, 2023Updated 3 years ago
SCUT-DLVCLab / GPT-4V_OCR
View on GitHub
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
☆128Nov 13, 2023Updated 2 years ago
kyegomez / Kosmos2.5
View on GitHub
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
☆75Jul 20, 2026Updated last week
Ucas-HaoranWei / Vary
View on GitHub
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
☆1,890Dec 30, 2024Updated last year
BlueCrescent / DocLLM
View on GitHub
Implementation of the DocLLM paper for Llama models.
☆13Apr 6, 2025Updated last year
Veason-silverbullet / ViTLP
View on GitHub
[NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence
☆149Sep 10, 2024Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
SII-sc22mc / DocFusion
View on GitHub
A Unified Framework for Document Parsing Tasks (Including Document Layout Analysis, OCR, Formula Recognition, and Table Recognition)
☆15Jul 1, 2025Updated last year
nttmdlab-nlp / InstructDoc
View on GitHub
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
☆162May 31, 2024Updated 2 years ago
Yuliang-Liu / Monkey
View on GitHub
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,951Jun 2, 2026Updated last month
Xiaomeng-Yang / STR_benchmark_cleansed
View on GitHub
☆14May 26, 2023Updated 3 years ago
liucun-zy / Pharos-ESG-A-Hierarchical-ToC-Based-Framework-for-ESG-Report-Parsing
View on GitHub
A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Reports
☆16Nov 14, 2025Updated 8 months ago
bytedance / MTVQA
View on GitHub
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…
☆64May 15, 2025Updated last year
thinh-re / s-multimae
View on GitHub
[ICPR-2024] S-MultiMAE - A Multi-Ground Truth approach for RGB-D Saliency Detection
☆11Dec 13, 2024Updated last year