MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.
☆24Dec 11, 2024Updated last year
Alternatives and similar repositories for Miner-PDF-Benchmark
Users that are interested in Miner-PDF-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆275Dec 6, 2025Updated 5 months ago
- Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding☆67Feb 10, 2026Updated 2 months ago
- MathNet: A Data-Centric Approach, Dataset and Benchmark Model to Advance Mathematical Expression Recognition☆10Mar 19, 2025Updated last year
- Chinese Mathematical Formula Detection (MFD) Dataset 中文文档数学公式检测数据集☆34Dec 21, 2022Updated 3 years ago
- The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"☆16Sep 2, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Implemented transformer NN block for Machine translation, text classfication, Natural language inference as well as Machine reading compr…☆11Mar 1, 2026Updated 2 months ago
- ☆37Jan 26, 2026Updated 3 months ago
- Compute benchmark of table structure recognition.☆28Dec 2, 2025Updated 5 months ago
- [ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist☆34Oct 23, 2024Updated last year
- Resources for Drug Repurposing In Alzheimer's Disease (DRIAD) work☆11Mar 4, 2021Updated 5 years ago
- TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition☆30Feb 5, 2026Updated 3 months ago
- ☆28Oct 14, 2024Updated last year
- Next-generation Punkt sentence boundary detection with zero dependencies☆30Nov 18, 2025Updated 5 months ago
- The Open-Source Data Annotation Platform☆1,218Feb 19, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆265Apr 14, 2025Updated last year
- UniTable: Towards a Unified Table Foundation Model☆529Apr 21, 2026Updated 2 weeks ago
- [ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos☆27Aug 8, 2025Updated 8 months ago
- A simple n-gram language model.☆12Sep 11, 2018Updated 7 years ago
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆2,134Apr 14, 2025Updated last year
- Code for our ACL'23 paper on how to identify metaphor mappings with the help of GPT-3☆11May 21, 2025Updated 11 months ago
- NodeBB Plugin enabling emoji as seen on http://www.emoji-cheat-sheet.com☆14Updated this week
- A full codebase for replicating the results of Nougat from downloading arXiv dataset to the final evaluation. It also contains a few fixe…☆11Dec 11, 2023Updated 2 years ago
- 阅读顺序、Layoutreader☆19May 8, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 本项目设计了一个基于UDP的网络拍卖行程序,包含客户端和服务端。使用语言:python3;UI设计:pyqt5;采用多线程。☆11Mar 27, 2020Updated 6 years ago
- An implementation of Deutsch–Jozsa algorithm on FPGA.☆14Nov 30, 2020Updated 5 years ago
- ☆52Mar 5, 2025Updated last year
- AIR retriever for Multi-Hop QA (ACL 2020 paper)☆30Jul 18, 2020Updated 5 years ago
- RS-Paper-Hub: A curated collection of remote sensing papers from arXiv. 遥感论文社:打造遥感领域的专属论文集(如卫星、无人机、地面基站)(http://rspaper.top/)☆37Updated this week
- A GAN demo project☆13Jan 2, 2020Updated 6 years ago
- MNBVC项目-ShareGPT语料清洗☆16Oct 4, 2023Updated 2 years ago
- Mozilla's speech-to-text backend☆38Sep 17, 2021Updated 4 years ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆307Sep 10, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Codes for the EMNLP'2020 paper "Predicting Clinical Trial Results by Implicit Evidence Integration".☆14Jan 13, 2021Updated 5 years ago
- [EMNLP 2024] Official repository for paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"☆21Oct 15, 2024Updated last year
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- Code and data for TACL paper It’s not Rocket Science: Interpreting Figurative Language in Narratives☆15Sep 4, 2023Updated 2 years ago
- Yet another LLM☆10Apr 6, 2023Updated 3 years ago
- ☆11Nov 15, 2016Updated 9 years ago
- Brave is a simple visualisation library for NLP information extraction, built on top of embedded BRAT.☆15Dec 25, 2019Updated 6 years ago