GbotHQ / ocr-dataset-rendering
☆21Updated last year
Alternatives and similar repositories for ocr-dataset-rendering:
Users that are interested in ocr-dataset-rendering are comparing it to the libraries listed below
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆50Updated last month
- A huge dataset for Document Visual Question Answering☆15Updated 5 months ago
- A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo☆32Updated 5 months ago
- ☆89Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆24Updated last year
- Towards Video Text Visual Question Answering: Benchmark and Baseline☆38Updated 10 months ago
- SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)☆83Updated last year
- ☆12Updated 7 months ago
- ☆26Updated 5 months ago
- ☆47Updated last year
- ☆58Updated last year
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆88Updated 9 months ago
- ☆36Updated 7 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆89Updated last week
- Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”☆17Updated 2 years ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 7 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆63Updated 4 months ago
- ☆59Updated 11 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆59Updated 3 months ago
- Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023☆44Updated 7 months ago
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆121Updated last year
- Index of URLs to pdf files all over the internet and scripts☆21Updated last year
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆63Updated 6 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆31Updated 6 months ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆75Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆78Updated 3 weeks ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆51Updated 2 months ago
- M4 experiment logbook☆56Updated last year
- Official code for infimm-hd☆15Updated 4 months ago
- ☆73Updated 10 months ago