AI-Application-and-Integration-Lab / Scene-Text-Detection-And-Recognition-Model_M503
☆13Updated last year
Alternatives and similar repositories for Scene-Text-Detection-And-Recognition-Model_M503:
Users that are interested in Scene-Text-Detection-And-Recognition-Model_M503 are comparing it to the libraries listed below
- ☆14Updated 2 years ago
- ☆131Updated last year
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆192Updated 6 months ago
- Document Artifical Intelligence☆157Updated 3 months ago
- An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Informat…☆54Updated last year
- ☆182Updated 8 months ago
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆614Updated last week
- ☆69Updated 7 months ago
- Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”☆52Updated last year
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆325Updated 7 months ago
- Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files☆139Updated last year
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆51Updated 5 months ago
- ☆62Updated last year
- CTE: Contextualized Table Extraction Dataset☆17Updated 2 years ago
- Code for CVPR21 paper A Multiplexed Network for End-to-End, Multilingual OCR☆80Updated 2 years ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆95Updated 2 months ago
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆23Updated 6 months ago
- Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.☆12Updated last year
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆137Updated 6 months ago
- An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".☆134Updated 3 weeks ago
- ☆172Updated last year
- Implementation and evaluation of multimodal RAG with text and image inputs for industrial applications☆43Updated 4 months ago
- Arrange methods and example on finetune LLMs☆74Updated 8 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆200Updated last year
- ☆30Updated 3 months ago
- A large scale camera-taken table detection and recognition dataset.☆123Updated last year
- This is the official repository for Retrieval Augmented Visual Question Answering☆214Updated 3 months ago
- Collection of Tools and Papers related to Adapters / Parameter-Efficient Transfer Learning/ Fine-Tuning☆188Updated 10 months ago
- ☆25Updated last month
- ☆82Updated 3 months ago