opendatalab/Miner-PDF-Benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/opendatalab/Miner-PDF-Benchmark)

opendatalab / Miner-PDF-Benchmark

MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.

☆24

Alternatives and similar repositories for Miner-PDF-Benchmark

Users that are interested in Miner-PDF-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

magicpdf / Magic-Doc
View on GitHub
conversion doc（pdf/html/doc/docx/ppt/pptx）to markdown
☆49Jul 23, 2024Updated 2 years ago
opendatalab / UniMERNet
View on GitHub
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
☆494Sep 28, 2025Updated 10 months ago
felix-schmitt / MathNet
View on GitHub
MathNet: A Data-Centric Approach, Dataset and Benchmark Model to Advance Mathematical Expression Recognition
☆10Mar 19, 2025Updated last year
InternScience / StructEqTable-Deploy
View on GitHub
A High-efficiency Open-source Toolkit for Table-to-Latex Task
☆276Dec 6, 2025Updated 7 months ago
microsoft / ArxivFormula
View on GitHub
This repo is used to release the ArxivFormula dataset.
☆35Nov 12, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
opendatalab / mineru-vl-utils
View on GitHub
A Python package for interacting with the MinerU Vision-Language Model.
☆134Jun 11, 2026Updated last month
breezedeus / CnMFD_Dataset
View on GitHub
Chinese Mathematical Formula Detection (MFD) Dataset 中文文档数学公式检测数据集
☆35Dec 21, 2022Updated 3 years ago
fooSynaptic / transformerForTasks
View on GitHub
Implemented transformer NN block for Machine translation, text classfication, Natural language inference as well as Machine reading compr…
☆11Jul 14, 2026Updated 2 weeks ago
MigoXLab / coderio
View on GitHub
An multi-agent design-to-code tool that generates production-ready React code with high visual fidelity and iterative validation.
☆111May 22, 2026Updated 2 months ago
FreeOCR-AI / layoutreader
View on GitHub
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
☆323Aug 15, 2025Updated 11 months ago
dinobby / MAgICoRE
View on GitHub
☆23Sep 19, 2024Updated last year
PremiLab-Math / MathCheck
View on GitHub
[ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
☆34Oct 23, 2024Updated last year
Big-Brother-Pikachu / Where2edit
View on GitHub
Official PyTorch implementation for "Where You Edit is What You Get: Text-Guided Image Editing with Region-Based Attention" (Pattern Reco…
☆10Oct 1, 2024Updated last year
labsyspharm / DRIADrc
View on GitHub
Resources for Drug Repurposing In Alzheimer's Disease (DRIAD) work
☆11Mar 4, 2021Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
nvidia-china-sae / WholeGraph
View on GitHub
☆11Mar 4, 2021Updated 5 years ago
TencentCloudADP / youtu-parsing
View on GitHub
Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding
☆69Jun 15, 2026Updated last month
Alpha-Innovator / DocGenome
View on GitHub
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
☆156Jan 13, 2025Updated last year
hallogameboy / QDS-Transformer
View on GitHub
☆16Sep 28, 2020Updated 5 years ago
poloclub / unitable
View on GitHub
UniTable: Towards a Unified Table Foundation Model
☆534Apr 21, 2026Updated 3 months ago
benpry / chain-of-thought-metaphor
View on GitHub
This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…
☆14Apr 28, 2023Updated 3 years ago
opendatalab / Meta-rater
View on GitHub
[ACL 2025 Best Theme Paper] This is the official implementation for the paper: "Meta-rater: A Multi-dimensional Data Selection Method for…
☆195Aug 29, 2025Updated 11 months ago
daandouwe / ngram-lm
View on GitHub
A simple n-gram language model.
☆12Sep 11, 2018Updated 7 years ago
opendatalab / OmniDocBench
View on GitHub
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
☆1,925Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ruc-datalab / SC-prompt
View on GitHub
☆12May 13, 2023Updated 3 years ago
sauc-abadal / ALT
View on GitHub
Official repository for ALT (ALignment with Textual feedback).
☆10Jul 25, 2024Updated 2 years ago
laihuiyuan / mFLAG
View on GitHub
Multi-Figurative Language Generation (COLING 2022)
☆12Jan 30, 2023Updated 3 years ago
lwachowiak / Metaphor-Extraction-With-GPT-3
View on GitHub
Code for our ACL'23 paper on how to identify metaphor mappings with the help of GPT-3
☆12May 21, 2025Updated last year
lilingxi01 / nougat-replication
View on GitHub
A full codebase for replicating the results of Nougat from downloading arXiv dataset to the final evaluation. It also contains a few fixe…
☆11Dec 11, 2023Updated 2 years ago
robinchew / workshop_notes
View on GitHub
☆14Mar 13, 2023Updated 3 years ago
zsc / qubit-fpga
View on GitHub
An implementation of Deutsch–Jozsa algorithm on FPGA.
☆15Nov 30, 2020Updated 5 years ago
Abbey4799 / auctionroom-socket-udp-pyqt
View on GitHub
本项目设计了一个基于UDP的网络拍卖行程序，包含客户端和服务端。使用语言：python3；UI设计：pyqt5；采用多线程。
☆11Mar 27, 2020Updated 6 years ago
vikas95 / AIR-retriever
View on GitHub
AIR retriever for Multi-Hop QA (ACL 2020 paper)
☆30Jul 18, 2020Updated 6 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
zxytim / pdf2images
View on GitHub
Convert pdf to pages of images
☆13Apr 18, 2020Updated 6 years ago
pany8125 / ShareGPTQAExtractor-mnbvc
View on GitHub
MNBVC项目-ShareGPT语料清洗
☆16Oct 4, 2023Updated 2 years ago
360AILAB-NLP / 360LayoutAnalysis
View on GitHub
360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute
☆305Sep 10, 2024Updated last year
steven-ccq / VisualReasoner
View on GitHub
[EMNLP 2024] Official repository for paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"
☆22Oct 15, 2024Updated last year
SWHL / TableRecognitionMetric
View on GitHub
Compute benchmark of table structure recognition.
☆31Dec 2, 2025Updated 7 months ago
SteinOveHelset / codingnews
View on GitHub
☆10Jan 31, 2021Updated 5 years ago
IslamKHALIL / Home-Visits-Manager
View on GitHub
☆16Sep 20, 2015Updated 10 years ago