[ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"
☆44Feb 27, 2026Updated 3 weeks ago
Alternatives and similar repositories for TextCoT
Users that are interested in TextCoT are comparing it to the libraries listed below
Sorting:
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆146Jun 20, 2024Updated last year
- Official PyTorch implementation for ACM MM22 "UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior"☆25Aug 5, 2024Updated last year
- Code and Dataset for our paper: Layout-Aware Single-Image Document Flattening☆23Dec 16, 2024Updated last year
- Document Artifical Intelligence☆202Sep 28, 2025Updated 5 months ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆48Aug 26, 2024Updated last year
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆34Mar 10, 2026Updated last week
- Inference, training and evaluation code for our paper "DocMatcher: Document Image Dewarping via Structural and Textual Line Matching" (WA…☆52Jul 1, 2025Updated 8 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- Unofficial implementation of DocMAE (WIP): Document Image Rectification via Self-supervised Representation Learning☆20Dec 20, 2023Updated 2 years ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆91Nov 15, 2024Updated last year
- ☆77Jul 31, 2025Updated 7 months ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated 9 months ago
- Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM☆45Oct 12, 2024Updated last year
- The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“☆72May 19, 2025Updated 10 months ago
- [ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning☆25Sep 6, 2025Updated 6 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆174Sep 25, 2024Updated last year
- ☆21Oct 10, 2023Updated 2 years ago
- ☆14Sep 6, 2024Updated last year
- ☆28Feb 10, 2025Updated last year
- Code from our paper "Template-guided Illumination Correction for Document Images with Imperfect Geometric Reconstruction " (ICCVW) 2023.☆28Feb 7, 2024Updated 2 years ago
- (ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning☆19Nov 22, 2025Updated 4 months ago
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆34Jul 3, 2025Updated 8 months ago
- Awesome lists about all kinds of awesome skills to help you go out of 35 crisis, and most important, to tell you how to enjoy your life.☆19Jul 9, 2022Updated 3 years ago
- Official code for DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degra…☆42May 28, 2025Updated 9 months ago
- [MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.☆41Apr 7, 2025Updated 11 months ago
- A zero-shot faithfulness evaluation metric for text summarization☆11Oct 17, 2023Updated 2 years ago
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆49Mar 18, 2024Updated 2 years ago
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆562Jan 4, 2025Updated last year
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆99Mar 22, 2024Updated last year
- This is an unofficial implementation to the EMNLP 2023 paper: Reading Order Matters: Information Extraction from Visually-rich Documents …☆16May 29, 2024Updated last year
- The code for paper: "DC-Net: Divide-and-Conquer for Salient Object Detection"☆20Aug 30, 2024Updated last year
- [Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"☆17Dec 1, 2023Updated 2 years ago
- Hanja Understanding Evaluation Dataset☆15May 2, 2022Updated 3 years ago
- Source code of the paper: Video Inpainting Localization with Contrastive Learning, IEEE SPL 2025.☆12Aug 9, 2025Updated 7 months ago
- [CVPR'25] Official code of paper "Mimic In-Context Learning for Multimodal Tasks"☆24Mar 10, 2026Updated last week
- The official code for “SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning”, ICCV, 20…☆31Jul 21, 2024Updated last year
- Collections of object goal navigation papers in recent top-tier conferences.☆14Sep 24, 2022Updated 3 years ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆33May 27, 2025Updated 9 months ago
- Scaffold Prompting to promote LMMs☆46Dec 16, 2024Updated last year