MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.
☆24Dec 11, 2024Updated last year
Alternatives and similar repositories for Miner-PDF-Benchmark
Users that are interested in Miner-PDF-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Python package for interacting with the MinerU Vision-Language Model.☆109Updated this week
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆276Dec 6, 2025Updated 3 months ago
- Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding☆60Feb 10, 2026Updated last month
- This repo is used to release the ArxivFormula dataset.☆35Nov 12, 2024Updated last year
- ☆11Nov 1, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Continuous diffusion for layout generation☆54Feb 19, 2025Updated last year
- Implemented transformer NN block for Machine translation, text classfication, Natural language inference as well as Machine reading compr…☆11Mar 1, 2026Updated 3 weeks ago
- ☆37Jan 26, 2026Updated 2 months ago
- Compute benchmark of table structure recognition.☆28Dec 2, 2025Updated 3 months ago
- TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition☆26Feb 5, 2026Updated last month
- [ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist☆35Oct 23, 2024Updated last year
- Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICP…☆10Nov 20, 2020Updated 5 years ago
- Official PyTorch implementation for "Where You Edit is What You Get: Text-Guided Image Editing with Region-Based Attention" (Pattern Reco…☆10Oct 1, 2024Updated last year
- ☆11Mar 4, 2021Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆15Sep 28, 2020Updated 5 years ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆153Jan 13, 2025Updated last year
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆261Apr 14, 2025Updated 11 months ago
- UniTable: Towards a Unified Table Foundation Model☆529Jun 4, 2024Updated last year
- [ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos☆25Aug 8, 2025Updated 7 months ago
- A comprehensive collection of data quality resources, tools, papers, and projects across various data types including traditional data, L…☆26Aug 29, 2025Updated 6 months ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Apr 28, 2023Updated 2 years ago
- Using Seq2Seq transformers for Text2SQL task on WikiSQL dataset.☆12Jan 8, 2022Updated 4 years ago
- Flux training codes (lora) for UniTEX☆24Jun 8, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆2,059Apr 14, 2025Updated 11 months ago
- Multi-Figurative Language Generation (COLING 2022)☆12Jan 30, 2023Updated 3 years ago
- Official repository for ALT (ALignment with Textual feedback).☆10Jul 25, 2024Updated last year
- Code for our ACL'23 paper on how to identify metaphor mappings with the help of GPT-3☆11May 21, 2025Updated 10 months ago
- NodeBB Plugin enabling emoji as seen on http://www.emoji-cheat-sheet.com☆14Mar 1, 2026Updated 3 weeks ago
- [ACL 2023] TeAST: Temporal Knowledge Graph Embedding via Archimedean Spiral Timeline☆12Mar 4, 2024Updated 2 years ago
- A full codebase for replicating the results of Nougat from downloading arXiv dataset to the final evaluation. It also contains a few fixe…☆11Dec 11, 2023Updated 2 years ago
- ☆13Mar 13, 2023Updated 3 years ago
- 阅读顺序、Layoutreader☆19May 8, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Ongoing research project for code&math LLMs☆27Jul 4, 2025Updated 8 months ago
- ☆52Mar 5, 2025Updated last year
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆307Sep 10, 2024Updated last year
- Test data for paper “αExtractor: a web server for automatic extraction of chemical structure from literature”☆18Dec 26, 2023Updated 2 years ago
- ☆10Jan 31, 2021Updated 5 years ago
- ☆13May 8, 2025Updated 10 months ago
- 推荐系统,web端展示基于django☆12Nov 1, 2017Updated 8 years ago