opendatalab/OmniDocBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/opendatalab/OmniDocBench)

opendatalab / OmniDocBench

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

☆1,914

Alternatives and similar repositories for OmniDocBench

Users that are interested in OmniDocBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

opendatalab / DocLayout-YOLO
View on GitHub
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
☆2,235Apr 14, 2025Updated last year
studio-dots-ai / dots.ocr
View on GitHub
Multilingual Document Layout Parsing in a Single Vision-Language Model
☆9,028Mar 24, 2026Updated 4 months ago
Yuliang-Liu / MultimodalOCR
View on GitHub
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
☆873Updated this week
opendatalab / UniMERNet
View on GitHub
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
☆492Sep 28, 2025Updated 9 months ago
bytedance / Dolphin
View on GitHub
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
☆9,039Mar 25, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
opendatalab / MinerU
View on GitHub
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
☆75,694Updated this week
Yuliang-Liu / MonkeyOCR
View on GitHub
A lightweight LMM-based Document Parsing Model
☆6,607Updated this week
opendatalab / PDF-Extract-Kit
View on GitHub
A Comprehensive Toolkit for High-Quality PDF Content Extraction
☆9,806Jan 3, 2025Updated last year
RapidAI / RapidTable
View on GitHub
基于序列表格识别算法推理库，集成PP-Structure和modelscope等表格识别算法。
☆432Apr 23, 2026Updated 3 months ago
allenai / olmocr
View on GitHub
Toolkit for linearizing PDFs for LLM datasets/training
☆19,182Mar 25, 2026Updated 4 months ago
Tencent-Hunyuan / HunyuanOCR
View on GitHub
HunyuanOCR-1.5: Making Lightweight OCR VLMs Faster and Better
☆1,881Updated this week
chatdoc-com / OCRFlux
View on GitHub
OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…
☆2,523Apr 14, 2026Updated 3 months ago
deepseek-ai / DeepSeek-OCR
View on GitHub
Contexts Optical Compression
☆23,675Jan 27, 2026Updated 5 months ago
intsig-textin / markdown_tester
View on GitHub
如需体验textin文档解析，请点击https://cc.co/16YSIy
☆129Jun 28, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
opendatalab / mineru-vl-utils
View on GitHub
A Python package for interacting with the MinerU Vision-Language Model.
☆136Jun 11, 2026Updated last month
Ucas-HaoranWei / GOT-OCR2.0
View on GitHub
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
☆8,158Feb 10, 2025Updated last year
opendatalab / OHR-Bench
View on GitHub
(ICCV 2025) OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
☆104Dec 3, 2025Updated 7 months ago
docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆63,762Updated this week
FreeOCR-AI / layoutreader
View on GitHub
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
☆322Aug 15, 2025Updated 11 months ago
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,843Updated this week
PaddlePaddle / PaddleOCR
View on GitHub
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…
☆86,238Updated this week
ucaslcl / Fox
View on GitHub
official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"
☆196May 31, 2024Updated 2 years ago
illuin-tech / colpali
View on GitHub
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
☆2,707Jul 13, 2026Updated last week
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
AlibabaResearch / AdvancedLiterateMachinery
View on GitHub
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…
☆1,833Mar 17, 2026Updated 4 months ago
Topdu / OpenOCR
View on GitHub
OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commer…
☆1,418May 20, 2026Updated 2 months ago
NanoNets / docext
View on GitHub
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
☆2,032Mar 17, 2026Updated 4 months ago
getomni-ai / benchmark
View on GitHub
OCR Benchmark
☆640Oct 21, 2025Updated 9 months ago
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,149Updated this week
InternScience / StructEqTable-Deploy
View on GitHub
A High-efficiency Open-source Toolkit for Table-to-Latex Task
☆276Dec 6, 2025Updated 7 months ago
infly-ai / INF-MLLM
View on GitHub
INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced document intelligence.
☆237Updated this week
X-PLUG / mPLUG-DocOwl
View on GitHub
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
☆2,410May 30, 2025Updated last year
alibaba / Logics-Parsing
View on GitHub
☆1,394May 13, 2026Updated 2 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,664Jan 30, 2026Updated 5 months ago
modelscope / ms-swift
View on GitHub
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…
☆14,946Updated this week
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,315Updated this week
Tencent / POINTS-Reader
View on GitHub
☆197Dec 7, 2025Updated 7 months ago
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,197Updated this week
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆87,138Updated this week
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,733Updated this week