bzluan/TextCoT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bzluan/TextCoT)

bzluan / TextCoT

[ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"

☆45

Alternatives and similar repositories for TextCoT

Users that are interested in TextCoT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

harrytea / UDoc-GAN
View on GitHub
Official PyTorch implementation for ACM MM22 "UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior"
☆25Aug 5, 2024Updated last year
chancharikmitra / CCoT
View on GitHub
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆142Jun 20, 2024Updated 2 years ago
harrytea / Awesome-Document-Understanding
View on GitHub
Document Artifical Intelligence
☆201Sep 28, 2025Updated 9 months ago
BunnySoCrazy / LA-DocFlatten
View on GitHub
Code and Dataset for our paper: Layout-Aware Single-Image Document Flattening
☆24Dec 16, 2024Updated last year
fh2019ustc / DeepEraser
View on GitHub
The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.
☆53Aug 26, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
FelixHertlein / doc-matcher
View on GitHub
Inference, training and evaluation code for our paper "DocMatcher: Document Image Dewarping via Structural and Textual Line Matching" (WA…
☆55Jul 1, 2025Updated last year
FelixHertlein / inv3d
View on GitHub
Project page for the ICDAR 2023 Paper "Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping".
☆13Dec 21, 2023Updated 2 years ago
DreamMr / HR-Bench
View on GitHub
PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…
☆49Mar 2, 2026Updated 4 months ago
ZZZHANG-jx / GCDRNet
View on GitHub
[TAI 2023] Appearance Enhancement for Camera-captured Document Images in the Wild
☆58Aug 28, 2025Updated 10 months ago
harrytea / TGDoc
View on GitHub
"Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023
☆16Nov 28, 2024Updated last year
whlscut / DocLayLLM
View on GitHub
[CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
☆30Dec 18, 2025Updated 7 months ago
Line-Kite / GraphLayoutLM
View on GitHub
☆14Sep 6, 2024Updated last year
DataArcTech / RagVL
View on GitHub
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …
☆92Nov 15, 2024Updated last year
RylonW / DocNLC
View on GitHub
Official code for DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degra…
☆44Mar 20, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
GaryJiajia / OFv2_ICL_VQA
View on GitHub
[CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering
☆21May 28, 2025Updated last year
yixuan730 / DetToolChain
View on GitHub
Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM
☆45Oct 12, 2024Updated last year
TempleX98 / MoVA
View on GitHub
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆174Sep 25, 2024Updated last year
archiki / RepARe
View on GitHub
☆21Oct 10, 2023Updated 2 years ago
fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
FelixHertlein / illtrtemplate-model
View on GitHub
Code from our paper "Template-guided Illumination Correction for Document Images with Imperfect Geometric Reconstruction " (ICCVW) 2023.
☆29Feb 7, 2024Updated 2 years ago
Fu-Dayuan / AgentRefine
View on GitHub
(ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning
☆20Nov 22, 2025Updated 7 months ago
chenxn2020 / GOSE
View on GitHub
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
☆17Dec 1, 2023Updated 2 years ago
GuangyanS / Sys2-LLaVA
View on GitHub
☆31Feb 10, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ChunelFeng / awesome-HowTo
View on GitHub
Awesome lists about all kinds of awesome skills to help you go out of 35 crisis, and most important, to tell you how to enjoy your life.
☆19Jul 9, 2022Updated 4 years ago
ZeningLin / PEneo
View on GitHub
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
☆41Apr 7, 2025Updated last year
SooLab / DDCOT
View on GitHub
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
☆48Mar 18, 2024Updated 2 years ago
ExplainableML / EgoCVR
View on GitHub
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Apr 11, 2025Updated last year
Social-AI-Studio / MATK
View on GitHub
Official repository for ACM Multimedia'23 paper "MATK: The Meme Analytical Tool Kit"
☆14May 29, 2024Updated 2 years ago
lezhang7 / Rearank
View on GitHub
[EMNLP 2025] Official codebase for Rearank: Reasoning Re-ranking Agent
☆40Aug 20, 2025Updated 11 months ago
pkunlp-icler / FastV
View on GitHub
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆592Jan 4, 2025Updated last year
bytedance / WildDoc
View on GitHub
The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“
☆74May 19, 2025Updated last year
fh2019ustc / SimFIR
View on GitHub
The official code for “SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning”, ICCV, 20…
☆33Jul 21, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
salesforce / QVR-SimpleDLM
View on GitHub
Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.
☆16May 1, 2025Updated last year
EAI-MCC / Awesome-ObjectGoal-Navigation
View on GitHub
Collections of object goal navigation papers in recent top-tier conferences.
☆14Sep 24, 2022Updated 3 years ago
multimediaFor / ViLocal
View on GitHub
Source code of the paper: Video Inpainting Localization with Contrastive Learning, IEEE SPL 2025.
☆12Aug 9, 2025Updated 11 months ago
2bgm / KIE-HVQA
View on GitHub
☆13Jun 10, 2025Updated last year
wannature / Detective-A-Dynamic-Integrated-Uncertainty-Valuation-Framework
View on GitHub
Pytorch implementation of Detective
☆13Jul 11, 2024Updated 2 years ago
dhg-wei / MCL
View on GitHub
(ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
☆28Sep 27, 2024Updated last year
hmvu-nv / vie_geo_llm
View on GitHub
This repo provides Geometric LayoutLM for Vietnamese document and code for export to ONNX
☆14Mar 3, 2024Updated 2 years ago