MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.
☆24Dec 11, 2024Updated last year
Alternatives and similar repositories for Miner-PDF-Benchmark
Users that are interested in Miner-PDF-Benchmark are comparing it to the libraries listed below
Sorting:
- Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding☆57Feb 10, 2026Updated 3 weeks ago
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆458Sep 28, 2025Updated 5 months ago
- conversion doc(pdf/html/doc/docx/ppt/pptx)to markdown☆48Jul 23, 2024Updated last year
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆275Dec 6, 2025Updated 2 months ago
- A Python package for interacting with the MinerU Vision-Language Model.☆106Feb 5, 2026Updated 3 weeks ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆29Nov 18, 2025Updated 3 months ago
- ☆23Sep 19, 2024Updated last year
- ☆37Jan 26, 2026Updated last month
- 专门用于处理视觉丰富文档转换后md文件的rag系统☆10Mar 16, 2025Updated 11 months ago
- ☆13Mar 13, 2023Updated 2 years ago
- 🧪 A minimal visual tool to verify YOLO-based object detection algorithms in custom scenarios.☆14Feb 20, 2026Updated last week
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆39May 28, 2025Updated 9 months ago
- 企业员工名片在线聊天商城微信小程序(云开发)☆10Jun 1, 2022Updated 3 years ago
- ☆10Oct 31, 2020Updated 5 years ago
- 生成训练文本检测数据集☆12Jul 1, 2020Updated 5 years ago
- TSDG: An efficient index graph for graph-based nearest neighbor search☆10Jul 14, 2022Updated 3 years ago
- rabitq rust implementation☆10Feb 4, 2026Updated last month
- UniTable: Towards a Unified Table Foundation Model☆525Jun 4, 2024Updated last year
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆312Aug 15, 2025Updated 6 months ago
- Code for our ACL'23 paper on how to identify metaphor mappings with the help of GPT-3☆11May 21, 2025Updated 9 months ago
- [2022CCL]☆13Sep 28, 2024Updated last year
- A static site generator built with node.js☆15Jul 15, 2020Updated 5 years ago
- 抓取Here地图的三维建筑物模型☆12Jun 29, 2017Updated 8 years ago
- 公众号☆10Jul 24, 2023Updated 2 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Jan 26, 2021Updated 5 years ago
- ☆12Jan 15, 2019Updated 7 years ago
- simplify the prediction process for a finetuned bert model☆11Jun 19, 2019Updated 6 years ago
- A PyTorch implementation of SimSiam based on CVPR 2021 paper "Exploring Simple Siamese Representation Learning"☆12Mar 23, 2021Updated 4 years ago
- 工业级中文语音识别系统电子书☆13Oct 30, 2020Updated 5 years ago
- 阿里天池比赛 印象盐城·数创未来大数据竞赛 - 盐城汽车上牌量预测☆12Mar 22, 2018Updated 7 years ago
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- Server wrapper for ml models☆11Sep 11, 2019Updated 6 years ago
- Built the chatbot using rule-based approach.☆11Feb 27, 2018Updated 8 years ago
- A minimal re-implementation of orthogonal fine-tuning (OFT) for LLMs. Based on nanoGPT and minLoRA.☆13Nov 17, 2023Updated 2 years ago
- Deep Autoencoding Predictive Components☆10Mar 4, 2021Updated 5 years ago
- This is an Augmented Reality application which will help in learning about Wild life animal by creating an augmented Zoo and Spread awar…☆10Nov 1, 2018Updated 7 years ago
- A GAN demo project☆12Jan 2, 2020Updated 6 years ago
- 基于 Tornado 6.x 的 RESTfulAPI 风格的项目模板,用于快速构建企业级高性能、高并发的服务端。☆12Nov 22, 2022Updated 3 years ago
- Uses cosine similarity to evaluate the distance between two texts (0 to 1).☆16Feb 27, 2019Updated 7 years ago