Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_feature.py
☆13Jan 30, 2020Updated 6 years ago
Alternatives and similar repositories for vqa-maskrcnn-benchmark-m4c
Users that are interested in vqa-maskrcnn-benchmark-m4c are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The imdb files with SBD-Trans OCR for TextVQA dataset.☆11Nov 30, 2021Updated 4 years ago
- Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]☆57Apr 5, 2022Updated 3 years ago
- [AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps☆24Mar 29, 2023Updated 3 years ago
- VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)☆57Mar 31, 2025Updated 11 months ago
- ☆188May 8, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)☆72May 22, 2023Updated 2 years ago
- RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering☆10Nov 27, 2022Updated 3 years ago
- [ACL'21] Dialogue Response Selection with Hierarchical Curriculum Learning☆21Nov 15, 2022Updated 3 years ago
- Extension of hLSTMat☆19Apr 15, 2021Updated 4 years ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆21Aug 1, 2025Updated 7 months ago
- ☆10Aug 21, 2021Updated 4 years ago
- Extension of Self-Supervised Temporal Hashing☆14Apr 15, 2021Updated 4 years ago
- The paper of "Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning" accepted in International Joint Conference on Arti…☆16Jun 29, 2017Updated 8 years ago
- STVQA and TextVQA OCR results from Amazon Text in Image pipeline☆12Jul 18, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆10Mar 31, 2025Updated 11 months ago
- A simple Python implementation of ngram sunburst (nested pie chart) visualization showed in CoQA paper☆13Mar 12, 2019Updated 7 years ago
- OCR Annotations from Amazon Textract for Industry Documents Library☆103Aug 20, 2022Updated 3 years ago
- ACL'2024-Main: Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Languag…☆12Sep 19, 2025Updated 6 months ago
- Multimodal retrieval in art with context embeddings.☆11Jan 5, 2022Updated 4 years ago
- ☆19Mar 5, 2025Updated last year
- ગુજરાતી ફોન્ટ અવલોકન☆19Jan 11, 2014Updated 12 years ago
- This repository offers a comprehensive overview of existing datasets and methods in the field of change captioning.☆17Sep 2, 2025Updated 6 months ago
- this repo contains some useful metadata for Fashion IQ challenge: https://sites.google.com/view/lingir/fashion-iq☆15Jun 28, 2019Updated 6 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- A modular framework for Visual Question Answering research by the FAIR A-STAR team☆45Aug 26, 2021Updated 4 years ago
- JSON Schema format for storing datasets details, documents processed contents, and documents annotations in the document understanding do…☆13Nov 5, 2024Updated last year
- An ambiguous subtitles dataset for visual scene-aware machine translation☆14Oct 17, 2022Updated 3 years ago
- ☆10Oct 1, 2020Updated 5 years ago
- ☆19Sep 14, 2024Updated last year
- Pytorch unofficial implementation of CMT☆13Jul 16, 2021Updated 4 years ago
- ☆11Oct 2, 2024Updated last year
- Multi-span Style Extraction for Generative Reading Comprehension☆10Apr 2, 2021Updated 4 years ago
- Official repository of the paper MPMQA: Multimodal Question Answering on Product Manuals (AAAI 2023)☆19Nov 28, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Code and Data for ManyModalQA: Modality Disambiguation and QA over Diverse Inputs☆17Mar 2, 2020Updated 6 years ago
- A package that makes Virtual Makeup easy.☆19Jun 24, 2021Updated 4 years ago
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆17Jul 14, 2025Updated 8 months ago
- 一个生成crnn训练数据集的工具,主要针对简体中文。☆15Apr 19, 2022Updated 3 years ago
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 9 months ago
- The official code of "CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval"☆15Sep 19, 2024Updated last year
- [IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering☆17Feb 16, 2026Updated last month