A huge dataset for Document Visual Question Answering
☆24Jul 29, 2024Updated last year
Alternatives and similar repositories for docmatix
Users that are interested in docmatix are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆215Aug 28, 2024Updated last year
- ☆101Sep 19, 2024Updated last year
- ☆52May 28, 2024Updated 2 years ago
- Official implementation for Dessurt: Document end-to-end self-supervised understanding and recognition transformer☆62Jan 11, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆12Mar 20, 2023Updated 3 years ago
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆30Dec 18, 2025Updated 6 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆56Apr 18, 2025Updated last year
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆23Sep 17, 2024Updated last year
- ☆25May 13, 2024Updated 2 years ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆54Dec 12, 2024Updated last year
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆30Jul 18, 2023Updated 2 years ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆24Jul 30, 2024Updated last year
- ☆45Jul 18, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation☆13Dec 22, 2022Updated 3 years ago
- ☆24May 26, 2026Updated last month
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆14Mar 6, 2025Updated last year
- ☆89Aug 18, 2024Updated last year
- ☆41Sep 9, 2025Updated 9 months ago
- OCR Annotations from Amazon Textract for Industry Documents Library☆103Aug 20, 2022Updated 3 years ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Jun 26, 2024Updated 2 years ago
- Repository for Adaptive Mixture MCL☆12Jun 6, 2022Updated 4 years ago
- [ICML 2024] Code release for "On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm"☆11Feb 20, 2025Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆424May 5, 2025Updated last year
- InnerEye dataset creation tool for InnerEye-DeepLearning library. Transforms DICOM data into mask for training Deep Learning models.☆21Mar 21, 2024Updated 2 years ago
- UMB: Understanding Model Behavior for Open-World object Detection (NeurIPS 2024)☆12May 26, 2024Updated 2 years ago
- [CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval☆37Sep 12, 2025Updated 9 months ago
- Official repository for "TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving"☆23Sep 1, 2025Updated 10 months ago
- Open-Retrieval Conversational Machine Reading: A new setting & OR-ShARC dataset☆13Nov 19, 2022Updated 3 years ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆37Sep 9, 2023Updated 2 years ago
- ☆14Feb 21, 2024Updated 2 years ago
- ☆40Aug 18, 2021Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- The repo of the Doc2SoarGraph framework☆10Sep 17, 2024Updated last year
- Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)☆58Apr 15, 2024Updated 2 years ago
- [NeurIPS 2023] Code release for "Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity"☆19Oct 19, 2023Updated 2 years ago
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆50Jun 4, 2025Updated last year
- NeurIPS 2024: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation☆13May 24, 2025Updated last year
- ☆56Jun 4, 2025Updated last year
- [NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection☆21Feb 3, 2024Updated 2 years ago