rubenpt91 / MP-DocVQA-FrameworkView external linksLinks
☆69Jan 9, 2024Updated 2 years ago
Alternatives and similar repositories for MP-DocVQA-Framework
Users that are interested in MP-DocVQA-Framework are comparing it to the libraries listed below
Sorting:
- ☆17Jun 12, 2024Updated last year
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆23Sep 17, 2024Updated last year
- Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.☆18Apr 23, 2023Updated 2 years ago
- ☆45Jul 18, 2022Updated 3 years ago
- ☆51May 28, 2024Updated last year
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆63May 15, 2025Updated 8 months ago
- Official repository of the paper: "A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition"☆26Jul 10, 2023Updated 2 years ago
- SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)☆104Mar 31, 2025Updated 10 months ago
- OCR Annotations from Amazon Textract for Industry Documents Library☆103Aug 20, 2022Updated 3 years ago
- ☆22Dec 8, 2022Updated 3 years ago
- The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.☆39Dec 7, 2023Updated 2 years ago
- VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)☆57Mar 31, 2025Updated 10 months ago
- Data Programming for Text Detection in Documents using SPEAR☆12Mar 26, 2025Updated 10 months ago
- DocILE: Document Information Localization and Extraction Benchmark☆139May 15, 2024Updated last year
- Document Visual Question Answering☆131Jul 30, 2020Updated 5 years ago
- ☆40Aug 18, 2021Updated 4 years ago
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆120Sep 28, 2025Updated 4 months ago
- Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023☆109Oct 24, 2023Updated 2 years ago
- Official implementation for "GLASS: Global to Local Attention for Scene-Text Spotting" (ECCV'22)☆102Jun 28, 2024Updated last year
- The official PyTorch implementation of SEMv3.☆51May 26, 2024Updated last year
- PyTorch implementation of BMVC2022 paper Masked Vision-Language Transformers for Scene Text Recognition☆29Nov 11, 2022Updated 3 years ago
- ☆142Feb 13, 2024Updated 2 years ago
- Code for CVPR21 paper A Multiplexed Network for End-to-End, Multilingual OCR☆80Dec 2, 2022Updated 3 years ago
- Tool to parse wiki tables from the HTML dump of Wikipedia☆11Jun 12, 2022Updated 3 years ago
- Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023☆46Jun 11, 2024Updated last year
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated last year
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆30May 23, 2023Updated 2 years ago
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆28Jul 12, 2023Updated 2 years ago
- ☆161Dec 27, 2022Updated 3 years ago
- Searching a High Performance Feature Extractor for Text Recognition Network. TPAMI 2022☆13Nov 25, 2022Updated 3 years ago
- ☆16Jan 10, 2025Updated last year
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- The official code of Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition (IJCAI2023)☆27Sep 3, 2023Updated 2 years ago
- Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Framework☆14May 31, 2023Updated 2 years ago
- Cross-lingual learning in scene text recognition (ICASSP2024)☆18Sep 29, 2024Updated last year
- ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models☆16Sep 27, 2024Updated last year
- Example codebase for fine-tuning layoutLMv3 on DocVQA☆52Sep 19, 2022Updated 3 years ago
- Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> O…☆33Aug 18, 2021Updated 4 years ago
- ☆238Apr 18, 2025Updated 9 months ago