[ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"
☆45Feb 27, 2026Updated 3 months ago
Alternatives and similar repositories for TextCoT
Users that are interested in TextCoT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official PyTorch implementation for ACM MM22 "UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior"☆25Aug 5, 2024Updated last year
- Code and Dataset for our paper: Layout-Aware Single-Image Document Flattening☆24Dec 16, 2024Updated last year
- Document Artifical Intelligence☆201Sep 28, 2025Updated 8 months ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆52Aug 26, 2024Updated last year
- Inference, training and evaluation code for our paper "DocMatcher: Document Image Dewarping via Structural and Textual Line Matching" (WA…☆54Jul 1, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Project page for the ICDAR 2023 Paper "Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping".☆13Dec 21, 2023Updated 2 years ago
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆33Mar 10, 2026Updated 3 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- Unofficial implementation of DocMAE (WIP): Document Image Rectification via Self-supervised Representation Learning☆20Dec 20, 2023Updated 2 years ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆49Mar 2, 2026Updated 3 months ago
- [TAI 2023] Appearance Enhancement for Camera-captured Document Images in the Wild☆58Aug 28, 2025Updated 9 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆92Nov 15, 2024Updated last year
- ROOT: VLM based System for Indoor Scene Understanding and Beyond☆41Jan 22, 2025Updated last year
- ☆80Jul 31, 2025Updated 10 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated last year
- Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM☆45Oct 12, 2024Updated last year
- The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“☆74May 19, 2025Updated last year
- [ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning☆26Sep 6, 2025Updated 9 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆174Sep 25, 2024Updated last year
- ☆21Oct 10, 2023Updated 2 years ago
- ☆14Sep 6, 2024Updated last year
- Code from our paper "Template-guided Illumination Correction for Document Images with Imperfect Geometric Reconstruction " (ICCVW) 2023.☆28Feb 7, 2024Updated 2 years ago
- (ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning☆19Nov 22, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆31Feb 10, 2025Updated last year
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆37Jul 3, 2025Updated 11 months ago
- Awesome lists about all kinds of awesome skills to help you go out of 35 crisis, and most important, to tell you how to enjoy your life.☆19Jul 9, 2022Updated 3 years ago
- Official code for DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degra…☆43Mar 20, 2026Updated 2 months ago
- [MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.☆41Apr 7, 2025Updated last year
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆40Apr 11, 2025Updated last year
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆48Mar 18, 2024Updated 2 years ago
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆99Mar 22, 2024Updated 2 years ago
- Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.☆16May 1, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- This is an unofficial implementation to the EMNLP 2023 paper: Reading Order Matters: Information Extraction from Visually-rich Documents …☆16May 29, 2024Updated 2 years ago
- The code for paper: "DC-Net: Divide-and-Conquer for Salient Object Detection"☆20Aug 30, 2024Updated last year
- [Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"☆17Dec 1, 2023Updated 2 years ago
- The official code for “SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning”, ICCV, 20…☆33Jul 21, 2024Updated last year
- Hanja Understanding Evaluation Dataset☆15May 2, 2022Updated 4 years ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆374Apr 20, 2025Updated last year
- Scaffold Prompting to promote LMMs☆46Dec 16, 2024Updated last year