[ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"
☆44Feb 27, 2026Updated last month
Alternatives and similar repositories for TextCoT
Users that are interested in TextCoT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆146Jun 20, 2024Updated last year
- Official PyTorch implementation for ACM MM22 "UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior"☆25Aug 5, 2024Updated last year
- Code and Dataset for our paper: Layout-Aware Single-Image Document Flattening☆23Dec 16, 2024Updated last year
- Document Artifical Intelligence☆202Sep 28, 2025Updated 6 months ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆51Aug 26, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Inference, training and evaluation code for our paper "DocMatcher: Document Image Dewarping via Structural and Textual Line Matching" (WA…☆53Jul 1, 2025Updated 9 months ago
- Project page for the ICDAR 2023 Paper "Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping".☆13Dec 21, 2023Updated 2 years ago
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆33Mar 10, 2026Updated last month
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆47Mar 2, 2026Updated last month
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆91Nov 15, 2024Updated last year
- [ACM MM 2022] Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild☆25Aug 12, 2022Updated 3 years ago
- ☆80Jul 31, 2025Updated 8 months ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“☆73May 19, 2025Updated 10 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆176Sep 25, 2024Updated last year
- ☆14Sep 6, 2024Updated last year
- ☆28Feb 10, 2025Updated last year
- Code from our paper "Template-guided Illumination Correction for Document Images with Imperfect Geometric Reconstruction " (ICCVW) 2023.☆28Feb 7, 2024Updated 2 years ago
- (ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning☆18Nov 22, 2025Updated 4 months ago
- [ICLR 2026 Oral] 🎉Hallucination Begins Where Saliency Drops☆48Feb 12, 2026Updated last month
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆35Jul 3, 2025Updated 9 months ago
- Awesome lists about all kinds of awesome skills to help you go out of 35 crisis, and most important, to tell you how to enjoy your life.☆19Jul 9, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.☆41Apr 7, 2025Updated last year
- A zero-shot faithfulness evaluation metric for text summarization☆11Oct 17, 2023Updated 2 years ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆40Apr 11, 2025Updated last year
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆49Mar 18, 2024Updated 2 years ago
- Here is a list of papers related to causal reinforcement learning, and I hope you can submit relevant missing papers in the issue.☆19Jan 23, 2024Updated 2 years ago
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆566Jan 4, 2025Updated last year
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆99Mar 22, 2024Updated 2 years ago
- Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.☆16May 1, 2025Updated 11 months ago
- This is an unofficial implementation to the EMNLP 2023 paper: Reading Order Matters: Information Extraction from Visually-rich Documents …☆16May 29, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"☆17Dec 1, 2023Updated 2 years ago
- The code for paper: "DC-Net: Divide-and-Conquer for Salient Object Detection"☆20Aug 30, 2024Updated last year
- [CVPR'25] Official code of paper "Mimic In-Context Learning for Multimodal Tasks"☆25Mar 10, 2026Updated last month
- Source code of the paper: Video Inpainting Localization with Contrastive Learning, IEEE SPL 2025.☆12Aug 9, 2025Updated 8 months ago
- Collections of object goal navigation papers in recent top-tier conferences.☆14Sep 24, 2022Updated 3 years ago
- This is the official repository of the EMNLP 2023 paper Reading Order Matters: Information Extraction from Visually-rich Documents by Tok…☆18Mar 15, 2024Updated 2 years ago
- ☆14Jun 10, 2025Updated 10 months ago