yh-hust/PDF-Wukong

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yh-hust/PDF-Wukong)

yh-hust / PDF-Wukong

【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

☆131

Alternatives and similar repositories for PDF-Wukong

Users that are interested in PDF-Wukong are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yh-hust / VisuRiddles
View on GitHub
VisuRiddles: Fine-grained Perception is a important thing for Multimodal Large Models in Riddles Solving
☆20Jun 9, 2026Updated last month
Yuliang-Liu / Monkey
View on GitHub
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,949Jun 2, 2026Updated last month
chenxn2020 / GOSE
View on GitHub
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
☆17Dec 1, 2023Updated 2 years ago
SCUT-DLVCLab / RFUND
View on GitHub
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…
☆21Dec 4, 2024Updated last year
dle666 / GeoFocus
View on GitHub
☆27Jul 5, 2026Updated 2 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
BADBADBADBOY / CardDetectRotate
View on GitHub
卡证和文档检测和矫正
☆87Sep 18, 2024Updated last year
zzyhlyoko / DCTC
View on GitHub
☆42Sep 2, 2023Updated 2 years ago
saicoco / SA-Text
View on GitHub
Pytorch implements SA-Text: Simple but Accurate Detector for Text of Arbitrary Shapes
☆42Jun 25, 2020Updated 6 years ago
BADBADBADBOY / baipiaoOCR
View on GitHub
convert paddleOCR to torchOCR, ppocr-v3,ppocr-v4, onnx, openvino
☆33Aug 16, 2023Updated 2 years ago
crossLi / Ultra_light_OCR_No.9
View on GitHub
☆12Jul 8, 2021Updated 5 years ago
adlnlp / mmvqa
View on GitHub
☆19Sep 11, 2024Updated last year
Gyann-z / FDP
View on GitHub
☆16Apr 21, 2025Updated last year
echo840 / LIRA
View on GitHub
[ICCV 2025] LIRA
☆22Nov 25, 2025Updated 7 months ago
LayTextLLM / LayTextLLM
View on GitHub
☆103Dec 23, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
OKC13 / General-Documents-Layout-parser
View on GitHub
通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser
☆47Jun 13, 2024Updated 2 years ago
SCUT-DLVCLab / OCR-Reasoning
View on GitHub
[ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
☆76May 26, 2026Updated last month
HCIILAB / M5HisDoc
View on GitHub
☆34Dec 18, 2025Updated 7 months ago
BADBADBADBOY / genete_ocr_data
View on GitHub
ocr data ,detect data ,recognize data
☆29Mar 24, 2020Updated 6 years ago
Yuliang-Liu / MultimodalOCR
View on GitHub
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
☆870Updated this week
shannanyinxiang / PageNet
View on GitHub
Official implementation of PageNet (IJCV 2022)
☆82Oct 31, 2022Updated 3 years ago
wenwenyu / MASTER-pytorch
View on GitHub
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)
☆281Dec 26, 2021Updated 4 years ago
wenwenyu / TCM
View on GitHub
Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)
☆202Jun 17, 2024Updated 2 years ago
SpursGoZmy / Table-LLaVA
View on GitHub
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …
☆227Jun 12, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Caiyuan-Zheng / Consistency_Regularization_STR
View on GitHub
It's the code for the paper Pushing the Performance Limit of Scene Text Recognizer without Human Annotation, CVPR 2022.
☆28Jul 6, 2022Updated 4 years ago
namtuanly / WikiTableSet
View on GitHub
WikiTableSet: A largest publicly available image-based table recognition dataset in three languages built from Wikipedia
☆32Jun 12, 2025Updated last year
limengyang1992 / seq2seq-layout-analysis
View on GitHub
end2end layout analysis based seq2seq
☆132Mar 8, 2021Updated 5 years ago
bytedance / TextHarmony
View on GitHub
The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation
☆127Nov 18, 2024Updated last year
tkianai / ICDAR2019-tools
View on GitHub
Tools for ICDAR2019 competitions(fifth place)
☆11May 6, 2019Updated 7 years ago
SCUT-DLVCLab / GPT-4V_OCR
View on GitHub
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
☆128Nov 13, 2023Updated 2 years ago
IBM / SynthTabNet
View on GitHub
Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files
☆154Sep 17, 2025Updated 10 months ago
Yuliang-Liu / bezier_curve_text_spotting
View on GitHub
A PyTorch implementation of "ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network" (CVPR 2020 oral)
☆433Apr 28, 2022Updated 4 years ago
shannanyinxiang / ViTEraser
View on GitHub
Official implementation of ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining (AAAI 20…
☆66Jul 4, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Yuliang-Liu / AlphaOracle
View on GitHub
[Innovation 2026] Oracle bone script decipherment via human-workflow-inspired deep learning
☆31Jun 22, 2026Updated 3 weeks ago
namtuanly / MTL-TabNet
View on GitHub
MTL-TabNet: Multi-task Learning based Model for Image-based Table Recognition
☆103May 30, 2024Updated 2 years ago
WenmuZhou / TableGeneration
View on GitHub
通过浏览器渲染生成表格图像
☆238Apr 10, 2024Updated 2 years ago
buptlihang / CDLA
View on GitHub
CDLA: A Chinese document layout analysis (CDLA) dataset
☆293Sep 13, 2021Updated 4 years ago
lcy0604 / QT-TextSR
View on GitHub
This repository is the implementation of "QT-TextSR: Enhancing scene text image super-resolution via efficient interaction with text reco…
☆20Jul 9, 2025Updated last year
RapidAI / TableStructureRec
View on GitHub
整理目前开源的最优表格识别模型，完善前后处理，模型转换为ONNX | Organize the currently open-source optimal table recognition models, improve pre-processing and post-…
☆954Aug 3, 2025Updated 11 months ago
cv-small-snails / Awesome-Table-Recognition
View on GitHub
A curated list of resources dedicated to table recognition
☆404Dec 12, 2024Updated last year