samakos / Document-AI-
☆14Updated last year
Alternatives and similar repositories for Document-AI-:
Users that are interested in Document-AI- are comparing it to the libraries listed below
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆14Updated 4 months ago
- [ICDAR 2024] (Best Student Paper🏆) Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation☆13Updated 6 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆32Updated last year
- Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`☆17Updated last year
- Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”☆17Updated 2 years ago
- Masked Vision-Language Transformer in Fashion☆33Updated last year
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆28Updated last year
- Datasets and Evaluation Scripts for CompHRDoc☆35Updated last month
- ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting☆33Updated 2 weeks ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆35Updated 7 months ago
- An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"☆77Updated last year
- CTE: Contextualized Table Extraction Dataset☆17Updated 2 years ago
- The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"☆17Updated last year
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆14Updated last year
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆79Updated 2 years ago
- DocReal: Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control Point Prediction☆20Updated last year
- [AAAI 2024] SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression☆63Updated last month
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation☆10Updated 5 months ago
- The official code of Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition (IJCAI2023)☆27Updated last year
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 7 months ago
- ☆82Updated 3 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆20Updated last week
- Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023☆45Updated 9 months ago
- Tool to parse wiki tables from the HTML dump of Wikipedia☆11Updated 2 years ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆64Updated 6 months ago
- ☆18Updated last year
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆58Updated last month
- [ICDAR 2023] (Oral) An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation☆70Updated 6 months ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated 2 weeks ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆21Updated 2 months ago