tingyaohsu / SciCap
SciCap Dataset
☆49Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for SciCap
- VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)☆51Updated 3 years ago
- ☆106Updated 4 months ago
- Dataset introduced in PlotQA: Reasoning over Scientific Plots☆70Updated last year
- The code related to the baselines from NeurIPS 2021 paper "DUE: End-to-End Document Understanding Benchmark."☆35Updated last year
- ☆64Updated 3 months ago
- Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"☆23Updated 5 months ago
- SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)☆78Updated last year
- ☆113Updated 2 years ago
- ☆11Updated last year
- ☆170Updated 4 months ago
- ☆28Updated last year
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning☆133Updated last year
- Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…☆27Updated 10 months ago
- Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"☆61Updated 2 years ago
- Code for ACL2023 paper: Pre-Training to Learn in Context☆107Updated 3 months ago
- ☆54Updated 10 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆110Updated last month
- ☆46Updated last month
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆57Updated 4 months ago
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated last year
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆84Updated 2 months ago
- Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering☆49Updated 2 years ago
- Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).☆28Updated 3 years ago
- Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.☆16Updated last year
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆19Updated 2 months ago
- A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)☆40Updated 2 years ago
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆53Updated 5 months ago
- [EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…☆20Updated 2 years ago
- Chart-to-Text: Generating Natural Language Explanations for Charts by Adapting the Transformer Model☆149Updated last year
- Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…☆52Updated 3 weeks ago