【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
☆128Jun 4, 2025Updated 9 months ago
Alternatives and similar repositories for PDF-Wukong
Users that are interested in PDF-Wukong are comparing it to the libraries listed below
Sorting:
- VisuRiddles: Fine-grained Perception is a important thing for Multimodal Large Models in Riddles Solving☆18Oct 22, 2025Updated 4 months ago
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,949Jan 24, 2026Updated last month
- 卡证和文档检测和矫正☆82Sep 18, 2024Updated last year
- ☆16Apr 21, 2025Updated 10 months ago
- [ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning☆73Dec 17, 2025Updated 3 months ago
- ☆19Sep 11, 2024Updated last year
- [Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"☆17Dec 1, 2023Updated 2 years ago
- [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…☆20Dec 4, 2024Updated last year
- ☆42Sep 2, 2023Updated 2 years ago
- convert paddleOCR to torchOCR, ppocr-v3,ppocr-v4, onnx, openvino☆33Aug 16, 2023Updated 2 years ago
- ☆12Jul 8, 2021Updated 4 years ago
- ☆31Dec 18, 2025Updated 3 months ago