X-PLUG/mPLUG-DocOwl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/X-PLUG/mPLUG-DocOwl)

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

☆2,408

Alternatives and similar repositories for mPLUG-DocOwl

Users that are interested in mPLUG-DocOwl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LukeForeverYoung / UReader
View on GitHub
☆142Feb 13, 2024Updated 2 years ago
AlibabaResearch / AdvancedLiterateMachinery
View on GitHub
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…
☆1,833Mar 17, 2026Updated 4 months ago
Ucas-HaoranWei / GOT-OCR2.0
View on GitHub
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
☆8,154Feb 10, 2025Updated last year
Yuliang-Liu / Monkey
View on GitHub
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,949Jun 2, 2026Updated last month
X-PLUG / mPLUG-Owl
View on GitHub
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,536Apr 2, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Ucas-HaoranWei / Vary
View on GitHub
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
☆1,889Dec 30, 2024Updated last year
Yuliang-Liu / MultimodalOCR
View on GitHub
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
☆873Updated this week
QwenLM / Qwen-VL
View on GitHub
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
☆6,712Aug 7, 2024Updated last year
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,135Updated this week
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,099Sep 22, 2025Updated 10 months ago
clovaai / donut
View on GitHub
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
☆6,907Jul 11, 2024Updated 2 years ago
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year
SCUT-DLVCLab / Document-AI-Recommendations
View on GitHub
Algorithms, papers, datasets, performance comparisons for Document AI.
☆209Mar 1, 2025Updated last year
ucaslcl / Fox
View on GitHub
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
☆196May 31, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆25,965Jun 25, 2026Updated 3 weeks ago
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,170Jan 23, 2026Updated 6 months ago
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,937Aug 12, 2024Updated last year
zai-org / CogVLM
View on GitHub
a state-of-the-art-level open visual language model | 多模态预训练模型
☆6,744May 29, 2024Updated 2 years ago
illuin-tech / colpali
View on GitHub
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
☆2,706Jul 13, 2026Updated last week
LingyvKong / OneChart
View on GitHub
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
☆266Apr 14, 2025Updated last year
SALT-NLP / LLaVAR
View on GitHub
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆268Jun 12, 2024Updated 2 years ago
facebookresearch / nougat
View on GitHub
Implementation of Nougat Neural Optical Understanding for Academic Documents
☆10,047Feb 21, 2025Updated last year
large-ocr-model / large-ocr-model.github.io
View on GitHub
☆189Feb 27, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
opendatalab / PDF-Extract-Kit
View on GitHub
A Comprehensive Toolkit for High-Quality PDF Content Extraction
☆9,801Jan 3, 2025Updated last year
Veason-silverbullet / ViTLP
View on GitHub
[NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence
☆149Sep 10, 2024Updated last year
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,954Jul 2, 2026Updated 2 weeks ago
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,645Jan 30, 2026Updated 5 months ago
HCIILAB / M6Doc
View on GitHub
☆164May 8, 2025Updated last year
nttmdlab-nlp / InstructDoc
View on GitHub
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
☆162May 31, 2024Updated 2 years ago
poloclub / unitable
View on GitHub
UniTable: Towards a Unified Table Foundation Model
☆533Apr 21, 2026Updated 3 months ago
harrytea / Awesome-Document-Understanding
View on GitHub
Document Artifical Intelligence
☆201Sep 28, 2025Updated 9 months ago
opendatalab / DocLayout-YOLO
View on GitHub
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
☆2,234Apr 14, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
LayTextLLM / LayTextLLM
View on GitHub
☆103Dec 23, 2024Updated last year
FlagOpen / FlagEmbedding
View on GitHub
Retrieval and Retrieval-augmented LLMs
☆11,968Apr 22, 2026Updated 3 months ago
InternLM / xtuner
View on GitHub
A Next-Generation Training Engine Built for Ultra-Large MoE Models
☆5,163Updated this week
adithya-s-k / omniparse
View on GitHub
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
☆7,639Dec 12, 2025Updated 7 months ago
ATH-MaaS / Ovis
View on GitHub
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
☆1,470Jul 15, 2026Updated last week
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,254Jun 2, 2026Updated last month
zai-org / CogVLM2
View on GitHub
GPT4V-level open-source multi-modal model based on Llama3-8B
☆2,435Mar 3, 2025Updated last year