zhiqic / ChartReaderLinks
[ICCV 2023] ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
☆23Updated last year
Alternatives and similar repositories for ChartReader
Users that are interested in ChartReader are comparing it to the libraries listed below
Sorting:
- Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023☆46Updated last year
- ☆65Updated last year
- ☆41Updated last year
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆25Updated last year
- ☆75Updated 10 months ago
- ☆26Updated 11 months ago
- Dataset introduced in PlotQA: Reasoning over Scientific Plots☆77Updated 2 years ago
- ☆19Updated 2 months ago
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Updated last year
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆91Updated last year
- ACL 2025: Synthetic data generation pipelines for text-rich images.☆86Updated 3 months ago
- ☆32Updated last year
- ☆67Updated 11 months ago
- [ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)☆42Updated last year
- A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo☆34Updated 10 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆62Updated 2 months ago
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆59Updated last month
- ☆28Updated last year
- [ICDAR 2023] (Oral) An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation☆73Updated 9 months ago
- A Survey on video and language understanding.☆50Updated 2 years ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆145Updated 2 months ago
- [ACL'25 Main] ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation☆51Updated last week
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆65Updated 9 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated 9 months ago
- Official implementation for Dessurt: Document end-to-end self-supervised understanding and recognition transformer☆60Updated 2 years ago
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆124Updated last year
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆96Updated 5 months ago
- Official repository of "Chatting Makes Perfect: Chat-based Image Retrieval"☆30Updated 4 months ago
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆83Updated 11 months ago
- Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…☆53Updated 7 months ago