[ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"
☆45Feb 27, 2026Updated 2 months ago
Alternatives and similar repositories for TextCoT
Users that are interested in TextCoT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆145Jun 20, 2024Updated last year
- Official PyTorch implementation for ACM MM22 "UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior"☆25Aug 5, 2024Updated last year
- Inference, training and evaluation code for our paper "DocMatcher: Document Image Dewarping via Structural and Textual Line Matching" (WA…☆53Jul 1, 2025Updated 10 months ago
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆33Mar 10, 2026Updated last month
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Unofficial implementation of DocMAE (WIP): Document Image Rectification via Self-supervised Representation Learning☆20Dec 20, 2023Updated 2 years ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆49Mar 2, 2026Updated last month
- [TAI 2023] Appearance Enhancement for Camera-captured Document Images in the Wild☆56Aug 28, 2025Updated 8 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆92Nov 15, 2024Updated last year
- ROOT: VLM based System for Indoor Scene Understanding and Beyond☆41Jan 22, 2025Updated last year
- [ACM MM 2022] Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild☆25Aug 12, 2022Updated 3 years ago
- ☆80Jul 31, 2025Updated 9 months ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated 11 months ago
- Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM☆45Oct 12, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The official repo for “WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?“☆73May 19, 2025Updated 11 months ago
- [ECCV] HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning☆26Sep 6, 2025Updated 7 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆175Sep 25, 2024Updated last year
- ☆21Oct 10, 2023Updated 2 years ago
- ☆14Sep 6, 2024Updated last year
- ☆29Feb 10, 2025Updated last year
- Code from our paper "Template-guided Illumination Correction for Document Images with Imperfect Geometric Reconstruction " (ICCVW) 2023.☆28Feb 7, 2024Updated 2 years ago
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆36Jul 3, 2025Updated 9 months ago
- Awesome lists about all kinds of awesome skills to help you go out of 35 crisis, and most important, to tell you how to enjoy your life.☆19Jul 9, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official code for DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degra…☆43Mar 20, 2026Updated last month
- [MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.☆41Apr 7, 2025Updated last year
- A zero-shot faithfulness evaluation metric for text summarization☆11Oct 17, 2023Updated 2 years ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆40Apr 11, 2025Updated last year
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆49Mar 18, 2024Updated 2 years ago
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆574Jan 4, 2025Updated last year
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆99Mar 22, 2024Updated 2 years ago
- This is an unofficial implementation to the EMNLP 2023 paper: Reading Order Matters: Information Extraction from Visually-rich Documents …☆16May 29, 2024Updated last year
- [Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"☆17Dec 1, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The official code for “SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning”, ICCV, 20…☆33Jul 21, 2024Updated last year
- Source code of the paper: Video Inpainting Localization with Contrastive Learning, IEEE SPL 2025.☆12Aug 9, 2025Updated 8 months ago
- Collections of object goal navigation papers in recent top-tier conferences.☆14Sep 24, 2022Updated 3 years ago
- This is the official repository of the EMNLP 2023 paper Reading Order Matters: Information Extraction from Visually-rich Documents by Tok…☆18Mar 15, 2024Updated 2 years ago
- ☆14Jun 10, 2025Updated 10 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆368Apr 20, 2025Updated last year
- Scaffold Prompting to promote LMMs☆46Dec 16, 2024Updated last year