Token-family/TokenFD

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Token-family/TokenFD)

Token-family / TokenFD

[ICCV2025] A Token-level Text Image Foundation Model for Document Understanding

☆132

Alternatives and similar repositories for TokenFD

Users that are interested in TokenFD are comparing it to the libraries listed below

Sorting:

TenMilesLotus / DTSM
View on GitHub
Code and data for the paper: DTSM: Toward Dense Table Structure Recognition with Text Query Encoder and Adjacent Feature Aggregator
☆12Apr 28, 2024Updated last year
SJTU-DeepVisionLab / FreeReal
View on GitHub
[ECCV2024] Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
☆19Sep 7, 2024Updated last year
bytedance / WildDoc
View on GitHub
The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“
☆71May 19, 2025Updated 9 months ago
whlscut / DocLayLLM
View on GitHub
[CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
☆27Dec 18, 2025Updated 2 months ago
SCUT-DLVCLab / OCR-Reasoning
View on GitHub
[ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
☆73Dec 17, 2025Updated 2 months ago
SJTU-DeepVisionLab / PosFormer
View on GitHub
[ECCV2024] PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
☆84Apr 10, 2025Updated 10 months ago
JCruan519 / GIST
View on GitHub
(ACM MM24) This is the offical repository of GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction.
☆11Jan 28, 2024Updated 2 years ago
ZZZHANG-jx / WMeter-Reader
View on GitHub
[TIM 2025] Towards Accurate Readings of Water Meters by Eliminating Transition Error: New Dataset and Effective Solution
☆12Mar 5, 2025Updated 11 months ago
muhd-umer / pyramidtabnet
View on GitHub
Official PyTorch implementation of PyramidTabNet: Transformer-based Table Recognition in Image-based Documents
☆28Oct 5, 2024Updated last year
shannanyinxiang / UPOCR
View on GitHub
Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)
☆67Jun 6, 2024Updated last year
shuyansy / Visual-Text-Processing-survey
View on GitHub
The official project of paper "Visual Text Processing: A Comprehensive Review and Unified Evaluation""
☆97Oct 20, 2025Updated 4 months ago
shannanyinxiang / ViTEraser
View on GitHub
Official implementation of ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining (AAAI 20…
☆62Jul 4, 2024Updated last year
yuyq96 / TextHawk
View on GitHub
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆66Nov 1, 2024Updated last year
onealwj / MVLT
View on GitHub
PyTorch implementation of BMVC2022 paper Masked Vision-Language Transformers for Scene Text Recognition
☆29Nov 11, 2022Updated 3 years ago
ZeningLin / PEneo
View on GitHub
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
☆41Apr 7, 2025Updated 10 months ago
VamosC / CLIP4STR
View on GitHub
An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
☆146Nov 14, 2025Updated 3 months ago
SpursGoZmy / Table-LLaVA
View on GitHub
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …
☆225Jun 12, 2025Updated 8 months ago
ZZZHANG-jx / DocKylin
View on GitHub
[AAAI 2025] DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
☆36Jun 1, 2025Updated 9 months ago
TongkunGuan / RFN
View on GitHub
[TCSVT2022] Industria Scene Text Detection
☆81Mar 3, 2023Updated 3 years ago
LayTextLLM / LayTextLLM
View on GitHub
☆102Dec 23, 2024Updated last year
TongkunGuan / Text-Related-Papers
View on GitHub
Update the latest text-related papers from top conferences
☆27Mar 12, 2025Updated 11 months ago
SCUT-DLVCLab / RFUND
View on GitHub
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…
☆20Dec 4, 2024Updated last year
Mountchicken / Union14M
View on GitHub
[ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective
☆201Nov 1, 2023Updated 2 years ago
ecnuljzhang / brush-your-text
View on GitHub
☆100Jan 3, 2024Updated 2 years ago
bytedance / E2STR
View on GitHub
The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
☆55Jun 14, 2024Updated last year
wenwenyu / TCM
View on GitHub
Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)
☆201Jun 17, 2024Updated last year
ZZR8066 / SEM
View on GitHub
☆19Mar 10, 2023Updated 2 years ago
PanguIR / MRAGSurvey
View on GitHub
A Survey of Multimodal Retrieval-Augmented Generation
☆20Nov 3, 2025Updated 4 months ago
xhli-git / DocSAM
View on GitHub
☆31Apr 8, 2025Updated 10 months ago
ViTAE-Transformer / SAMText
View on GitHub
The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"
☆16May 3, 2023Updated 2 years ago
yeungchenwa / HDR
View on GitHub
[AAAI2025 Oral] Predicting the Original Appearance of Damaged Historical Documents
☆106Jul 15, 2025Updated 7 months ago
ZYM-PKU / UDiffText
View on GitHub
[ECCV 2024] Official repo for UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diff…
☆234Feb 14, 2025Updated last year
wangyuxin87 / Tampered-IC13
View on GitHub
-
☆23Oct 25, 2022Updated 3 years ago
zzyhlyoko / DCTC
View on GitHub
☆42Sep 2, 2023Updated 2 years ago
Wei-ucas / TPSNet
View on GitHub
☆27Nov 29, 2023Updated 2 years ago
ZJU-REAL / cooper
View on GitHub
☆25Aug 19, 2025Updated 6 months ago
bytedance / TextHarmony
View on GitHub
The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation
☆130Nov 18, 2024Updated last year
TongkunGuan / CCD
View on GitHub
[ICCV2023] Self-supervised Character-to-Character Distillation for Text Recognition
☆151Apr 20, 2024Updated last year
xdxie / WAS_WordArt-Segmentation
View on GitHub
The official codes and datasets for Artistic Text Segmentation (ECCV 2024).
☆28Sep 24, 2025Updated 5 months ago