MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.
☆24Dec 11, 2024Updated last year
Alternatives and similar repositories for Miner-PDF-Benchmark
Users that are interested in Miner-PDF-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆476Sep 28, 2025Updated 7 months ago
- A Python package for interacting with the MinerU Vision-Language Model.☆120May 19, 2026Updated last week
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆277Dec 6, 2025Updated 5 months ago
- Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding☆69Feb 10, 2026Updated 3 months ago
- This repo is used to release the ArxivFormula dataset.☆35Nov 12, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- MathNet: A Data-Centric Approach, Dataset and Benchmark Model to Advance Mathematical Expression Recognition☆10Mar 19, 2025Updated last year
- argparse extension for hpman☆17Dec 4, 2022Updated 3 years ago
- Chinese Mathematical Formula Detection (MFD) Dataset 中文文档数学公式检测数据集☆34Dec 21, 2022Updated 3 years ago
- The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"☆16Sep 2, 2024Updated last year
- Continuous diffusion for layout generation☆55Feb 19, 2025Updated last year
- Implemented transformer NN block for Machine translation, text classfication, Natural language inference as well as Machine reading compr…☆11Mar 1, 2026Updated 2 months ago
- Repository for initial POC NLP based SQL adapter using LLM.☆10May 6, 2025Updated last year
- ☆37Jan 26, 2026Updated 4 months ago
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆40May 28, 2025Updated 11 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Compute benchmark of table structure recognition.☆28Dec 2, 2025Updated 5 months ago
- ☆23Sep 19, 2024Updated last year
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆320Aug 15, 2025Updated 9 months ago
- Data browser based on s3. 一个基于 S3 的数据(json / jsonl / parquet / html / md等)可视化工具。👇 Try online.☆85Apr 14, 2026Updated last month
- [ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist☆34Oct 23, 2024Updated last year
- TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition☆32Feb 5, 2026Updated 3 months ago
- The Open-Source Data Annotation Platform☆1,226Feb 19, 2025Updated last year
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆266Apr 14, 2025Updated last year
- UniTable: Towards a Unified Table Foundation Model☆531Apr 21, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos☆27Aug 8, 2025Updated 9 months ago
- ☆116May 19, 2026Updated last week
- Using Seq2Seq transformers for Text2SQL task on WikiSQL dataset.☆12Jan 8, 2022Updated 4 years ago
- A simple n-gram language model.☆12Sep 11, 2018Updated 7 years ago
- Multi-Figurative Language Generation (COLING 2022)☆12Jan 30, 2023Updated 3 years ago
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆2,166Apr 14, 2025Updated last year
- Code for our ACL'23 paper on how to identify metaphor mappings with the help of GPT-3☆11May 21, 2025Updated last year
- NodeBB Plugin enabling emoji as seen on http://www.emoji-cheat-sheet.com☆14May 13, 2026Updated last week
- A full codebase for replicating the results of Nougat from downloading arXiv dataset to the final evaluation. It also contains a few fixe…☆11Dec 11, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An implementation of Deutsch–Jozsa algorithm on FPGA.☆15Nov 30, 2020Updated 5 years ago
- ☆52Mar 5, 2025Updated last year
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆306Sep 10, 2024Updated last year
- ☆10Jan 31, 2021Updated 5 years ago
- [EMNLP 2024] Official repository for paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"☆21Oct 15, 2024Updated last year
- Multi-Task instruction-tuned LLaMA☆14May 5, 2023Updated 3 years ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆195May 31, 2024Updated last year