MAEHCM / AET
Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”
☆17Updated last year
Related projects ⓘ
Alternatives and complementary repositories for AET
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆28Updated last year
- ☆18Updated last year
- ☆15Updated 2 years ago
- arXiv 23 "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs"☆13Updated 9 months ago
- CTE: Contextualized Table Extraction Dataset☆17Updated last year
- A huge dataset for Document Visual Question Answering☆13Updated 3 months ago
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆45Updated last month
- [FGVC9-CVPR 2022] The second place solution for 2nd eBay eProduct Visual Search Challenge.☆26Updated 2 years ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆73Updated last year
- ☆29Updated last year
- PyTorch implementation of BMVC2022 paper Masked Vision-Language Transformers for Scene Text Recognition☆29Updated 2 years ago
- Datasets and Evaluation Scripts for CompHRDoc☆25Updated 7 months ago
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆19Updated 2 months ago
- PyTorch implementation of STR models for transfer learning in Indic Languages☆16Updated 3 years ago
- Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`☆17Updated last year
- TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers☆21Updated 2 years ago
- ☆10Updated last year
- running LayoutLMv2☆11Updated 2 years ago
- [ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…☆23Updated last year
- Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".☆17Updated 3 years ago
- ☆32Updated 2 years ago
- ☆23Updated 3 years ago
- Textual Visual Semantic Dataset for Text Spotting. CVPRW 2020☆10Updated 2 years ago
- ☆33Updated 6 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- Vision-Language Pre-Training for Boosting Scene Text Detectors (CVPR2022)☆12Updated 2 years ago
- ☆15Updated last year
- SciCap Dataset☆48Updated 3 years ago
- ☆11Updated 5 months ago