The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
☆44Sep 24, 2024Updated last year
Alternatives and similar repositories for TextCoT
Users that are interested in TextCoT are comparing it to the libraries listed below
Sorting:
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆145Jun 20, 2024Updated last year
- Code and Dataset for our paper: Layout-Aware Single-Image Document Flattening☆23Dec 16, 2024Updated last year
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆48Aug 26, 2024Updated last year
- Document Artifical Intelligence☆201Sep 28, 2025Updated 5 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆91Nov 15, 2024Updated last year
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆173Sep 25, 2024Updated last year
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆40Dec 5, 2025Updated 2 months ago
- Project page for the ICDAR 2023 Paper "Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping".☆14Dec 21, 2023Updated 2 years ago
- [ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning☆25Sep 6, 2025Updated 5 months ago
- (ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning☆19Nov 22, 2025Updated 3 months ago
- [AAAI-26] Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?☆26Dec 14, 2025Updated 2 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆41Apr 11, 2025Updated 10 months ago
- ☆20Mar 3, 2025Updated 11 months ago
- ☆21Oct 10, 2023Updated 2 years ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated 9 months ago
- The code of IJCAI2022 paper, Declaration-based Prompt Tuning for Visual Question Answering☆20May 10, 2022Updated 3 years ago
- ☆75Jul 31, 2025Updated 7 months ago
- [EMNLP 2025] Official codebase for Rearank: Reasoning Re-ranking Agent☆33Aug 20, 2025Updated 6 months ago
- Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM☆45Oct 12, 2024Updated last year
- The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“☆71May 19, 2025Updated 9 months ago
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models