AI-Application-and-Integration-Lab / Scene-Text-Detection-And-Recognition-Model_M503
☆13Updated last year
Alternatives and similar repositories for Scene-Text-Detection-And-Recognition-Model_M503:
Users that are interested in Scene-Text-Detection-And-Recognition-Model_M503 are comparing it to the libraries listed below
- ☆14Updated 2 years ago
- ☆132Updated last year
- Document Artifical Intelligence☆160Updated this week
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆196Updated 7 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆97Updated 3 months ago
- ☆191Updated last week
- An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Informat…☆53Updated last year
- ☆38Updated 11 months ago
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆23Updated 7 months ago
- An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".☆139Updated last month
- A curated list of papers about key information extraction.☆91Updated 4 months ago
- Official Implementation of TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism☆25Updated last week
- ☆63Updated last year
- ☆86Updated 4 months ago
- Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.☆16Updated last year
- Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”☆53Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆84Updated 9 months ago
- OCR Annotations from Amazon Textract for Industry Documents Library☆103Updated 2 years ago
- https://dl.acm.org/doi/10.1145/3657281☆96Updated last year
- ☆28Updated 2 months ago
- Example codebase for fine-tuning layoutLMv3 on DocVQA☆50Updated 2 years ago
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"☆266Updated 10 months ago
- High-Performance Transformers for Table Structure Recognition Need Early Convolutions☆43Updated last year
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆332Updated 8 months ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆37Updated last year
- Applied Deep Learning (2021 Spring) at National Taiwan University (NTU) CSIE☆9Updated 3 years ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆125Updated 10 months ago
- Implementation of the DocLLM paper for Llama models.☆13Updated 3 weeks ago
- ☆71Updated 8 months ago
- Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.☆12Updated last year