LukeForeverYoung/UReader

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LukeForeverYoung/UReader)

LukeForeverYoung / UReader

☆142

Alternatives and similar repositories for UReader

Users that are interested in UReader are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

X-PLUG / mPLUG-DocOwl
View on GitHub
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
☆2,409May 30, 2025Updated last year
HCIILAB / M6Doc
View on GitHub
☆164May 8, 2025Updated last year
nttmdlab-nlp / InstructDoc
View on GitHub
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
☆162May 31, 2024Updated 2 years ago
mxin262 / ESTextSpotter
View on GitHub
(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
☆78Apr 9, 2024Updated 2 years ago
MAEHCM / ICL-D3IE
View on GitHub
Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”
☆54Aug 8, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Yuliang-Liu / MultimodalOCR
View on GitHub
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
☆873Updated this week
SCUT-DLVCLab / RFUND
View on GitHub
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…
☆21Dec 4, 2024Updated last year
Ucas-HaoranWei / Vary-tiny-600k
View on GitHub
Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)
☆89Sep 21, 2024Updated last year
SCUT-DLVCLab / Document-AI-Recommendations
View on GitHub
Algorithms, papers, datasets, performance comparisons for Document AI.
☆209Mar 1, 2025Updated last year
SCUT-DLVCLab / GPT-4V_OCR
View on GitHub
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
☆128Nov 13, 2023Updated 2 years ago
ZeningLin / PEneo
View on GitHub
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
☆41Apr 7, 2025Updated last year
SALT-NLP / LLaVAR
View on GitHub
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆268Jun 12, 2024Updated 2 years ago
Yuliang-Liu / Monkey
View on GitHub
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,949Jun 2, 2026Updated last month
ayumiymk / DiG
View on GitHub
Official PyTorch implementation of `Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition`
☆74Feb 27, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ZeningLin / ViBERTgrid-PyTorch
View on GitHub
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Informat…
☆53Jan 9, 2024Updated 2 years ago
weijiawu / TransDETR
View on GitHub
[IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer
☆114Mar 28, 2024Updated 2 years ago
weijiawu / TransVTSpotter
View on GitHub
A new video text spotting framework with Transformer
☆82May 23, 2022Updated 4 years ago
lcy0604 / CTRNet
View on GitHub
This repository is the implementation of "Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Contex…
☆97Feb 21, 2023Updated 3 years ago
microsoft / CompHRDoc
View on GitHub
Datasets and Evaluation Scripts for CompHRDoc
☆59Feb 25, 2025Updated last year
uakarsh / TiLT-Implementation
View on GitHub
Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.
☆18Apr 23, 2023Updated 3 years ago
large-ocr-model / large-ocr-model.github.io
View on GitHub
☆189Feb 27, 2024Updated 2 years ago
guoxy25 / Ocean-OCR
View on GitHub
☆48Feb 7, 2025Updated last year
rossumai / docile
View on GitHub
DocILE: Document Information Localization and Extraction Benchmark
☆149Jun 17, 2026Updated last month
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Mountchicken / Union14M
View on GitHub
[ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective
☆206Nov 1, 2023Updated 2 years ago
InternScience / SimChart9K
View on GitHub
The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.
☆26Feb 22, 2024Updated 2 years ago
CyrilSterling / LPV
View on GitHub
The official code of Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition (IJCAI2023)
☆26Sep 3, 2023Updated 2 years ago
dali92002 / SSL-OCR
View on GitHub
Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023
☆30Jul 12, 2023Updated 3 years ago
yuyq96 / TextHawk
View on GitHub
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆68Nov 1, 2024Updated last year
Veason-silverbullet / ViTLP
View on GitHub
[NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence
☆149Sep 10, 2024Updated last year
X-PLUG / mPLUG-Owl
View on GitHub
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,536Apr 2, 2025Updated last year
zlwang-cs / LASER-release
View on GitHub
Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Framework
☆14May 31, 2023Updated 3 years ago
google-research / pix2struct
View on GitHub
☆686Jul 8, 2026Updated 2 weeks ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
WenjinW / LATIN-Prompt
View on GitHub
☆52May 28, 2024Updated 2 years ago
ViTAE-Transformer / SAMText
View on GitHub
The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"
☆16May 3, 2023Updated 3 years ago
harrytea / Awesome-Document-Understanding
View on GitHub
Document Artifical Intelligence
☆201Sep 28, 2025Updated 9 months ago
Ucas-HaoranWei / Vary
View on GitHub
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
☆1,889Dec 30, 2024Updated last year
yeungchenwa / OCR-SAM
View on GitHub
[Open-Source Project] Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instance…
☆590Jan 30, 2024Updated 2 years ago
LayTextLLM / LayTextLLM
View on GitHub
☆103Dec 23, 2024Updated last year
DocTron-hub / Chart-R1
View on GitHub
Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner
☆24Aug 7, 2025Updated 11 months ago