AI-Application-and-Integration-Lab / Scene-Text-Detection-And-Recognition-Model_M503
☆13Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for Scene-Text-Detection-And-Recognition-Model_M503
- ☆14Updated last year
- ☆127Updated 9 months ago
- An open-source implementaion for fine-tuning Qwen2-VL series by Alibaba Cloud.☆117Updated 2 weeks ago
- Document Artifical Intelligence☆131Updated last month
- ☆170Updated 4 months ago
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆54Updated 5 months ago
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆164Updated last month
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆67Updated 4 months ago
- ☆54Updated 10 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆84Updated 2 months ago
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"☆258Updated 5 months ago
- Applied Deep Learning (2021 Spring) at National Taiwan University (NTU) CSIE☆10Updated 3 years ago
- Reading list for Multimodal Large Language Models☆65Updated last year
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆49Updated 2 months ago
- The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++:…☆248Updated 3 months ago
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆121Updated last year
- [ACL 2024] ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.☆107Updated 2 months ago
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆23Updated last year
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆75Updated last month
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆288Updated 2 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆110Updated last month
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆18Updated 8 months ago
- Arrange methods and example on finetune LLMs☆69Updated 4 months ago
- ☆35Updated last year
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆69Updated last month
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆80Updated 5 months ago
- Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.☆12Updated 8 months ago
- [TPAMI'24] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation☆211Updated last week
- ☆156Updated 8 months ago
- [ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective☆171Updated last year