ucaslcl/Fox

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ucaslcl/Fox)

ucaslcl / Fox

official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"

☆196

Alternatives and similar repositories for Fox

Users that are interested in Fox are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LingyvKong / OneChart
View on GitHub
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
☆266Apr 14, 2025Updated last year
ucasyjz / VIP
View on GitHub
[ACCV 2024 Poster] official code for "VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model"
☆10Sep 28, 2024Updated last year
Ucas-HaoranWei / Vary-tiny-600k
View on GitHub
Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)
☆89Sep 21, 2024Updated last year
Ucas-HaoranWei / Vary
View on GitHub
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
☆1,889Dec 30, 2024Updated last year
Ucas-HaoranWei / Vary-family
View on GitHub
☆57Jan 23, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
nttmdlab-nlp / InstructDoc
View on GitHub
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
☆162May 31, 2024Updated 2 years ago
Ucas-HaoranWei / Vary-toy
View on GitHub
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
☆630Dec 30, 2024Updated last year
LayTextLLM / LayTextLLM
View on GitHub
☆103Dec 23, 2024Updated last year
X-PLUG / mPLUG-DocOwl
View on GitHub
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
☆2,408May 30, 2025Updated last year
guoxy25 / Ocean-OCR
View on GitHub
☆48Feb 7, 2025Updated last year
Ucas-HaoranWei / GOT-OCR2.0
View on GitHub
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
☆8,155Feb 10, 2025Updated last year
harrytea / Awesome-Document-Understanding
View on GitHub
Document Artifical Intelligence
☆201Sep 28, 2025Updated 9 months ago
Ucas-HaoranWei / Aircraft-KP
View on GitHub
Keypoint dataset for airplane
☆10Dec 28, 2019Updated 6 years ago
Alpha-Innovator / DocGenome
View on GitHub
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
☆156Jan 13, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
chenxn2020 / GOSE
View on GitHub
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
☆17Dec 1, 2023Updated 2 years ago
intsig-textin / markdown_tester
View on GitHub
如需体验textin文档解析，请点击https://cc.co/16YSIy
☆129Jun 28, 2025Updated last year
Yuliang-Liu / MultimodalOCR
View on GitHub
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
☆870Updated this week
GbotHQ / ocr-dataset-rendering
View on GitHub
☆39Oct 7, 2023Updated 2 years ago
FreeOCR-AI / layoutreader
View on GitHub
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
☆322Aug 15, 2025Updated 11 months ago
felix-schmitt / MathNet
View on GitHub
MathNet: A Data-Centric Approach, Dataset and Benchmark Model to Advance Mathematical Expression Recognition
☆10Mar 19, 2025Updated last year
InternScience / StructEqTable-Deploy
View on GitHub
A High-efficiency Open-source Toolkit for Table-to-Latex Task
☆276Dec 6, 2025Updated 7 months ago
WenmuZhou / TableGeneration
View on GitHub
通过浏览器渲染生成表格图像
☆238Apr 10, 2024Updated 2 years ago
MaxKinny / TabRecSet
View on GitHub
A large scale camera-taken table detection and recognition dataset.
☆150Apr 9, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
AlibabaResearch / AdvancedLiterateMachinery
View on GitHub
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…
☆1,833Mar 17, 2026Updated 4 months ago
MosRat / got.cpp
View on GitHub
Using Llam.cpp and onnxruntime to accelerate inference of GOT-OCR2.0
☆15Mar 6, 2025Updated last year
BaofengZan / GOT-OCRv2-onnx
View on GitHub
用于学习GOT/Qwen/OnnxLLm
☆55Oct 8, 2024Updated last year
Chunchunwumu / SEMv3
View on GitHub
The official PyTorch implementation of SEMv3.
☆53May 26, 2024Updated 2 years ago
SpursGoZmy / Table-LLaVA
View on GitHub
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …
☆227Jun 12, 2025Updated last year
iiclab / DecompST
View on GitHub
☆15Nov 26, 2023Updated 2 years ago
Yuliang-Liu / Monkey
View on GitHub
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,949Jun 2, 2026Updated last month
opendatalab / OmniDocBench
View on GitHub
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
☆1,900Jun 26, 2026Updated 3 weeks ago
herobd / dessurt
View on GitHub
Official implementation for Dessurt: Document end-to-end self-supervised understanding and recognition transformer
☆62Jan 11, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
LukeForeverYoung / UReader
View on GitHub
☆142Feb 13, 2024Updated 2 years ago
CIawevy / TextPecker
View on GitHub
[CVPR2026] TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering
☆53Updated this week
JG1VPP / MuTabNet
View on GitHub
ICDAR 2024/2026 Table OCR Model
☆39Jun 16, 2026Updated last month
opendatalab / UniMERNet
View on GitHub
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
☆492Sep 28, 2025Updated 9 months ago
namtuanly / WikiTableSet
View on GitHub
WikiTableSet: A largest publicly available image-based table recognition dataset in three languages built from Wikipedia
☆32Jun 12, 2025Updated last year
fengbinzhu / Doc2SoarGraph
View on GitHub
The repo of the Doc2SoarGraph framework
☆10Sep 17, 2024Updated last year
HCIILAB / M6Doc
View on GitHub
☆163May 8, 2025Updated last year