ronghanghu / mmfView external linksLinks
A modular framework for Visual Question Answering research by the FAIR A-STAR team
☆45Aug 26, 2021Updated 4 years ago
Alternatives and similar repositories for mmf
Users that are interested in mmf are comparing it to the libraries listed below
Sorting:
- Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]☆57Apr 5, 2022Updated 3 years ago
- The imdb files with SBD-Trans OCR for TextVQA dataset.☆11Nov 30, 2021Updated 4 years ago
- Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.☆65Sep 15, 2021Updated 4 years ago
- ☆30May 7, 2021Updated 4 years ago
- ☆188May 8, 2024Updated last year
- [AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps☆24Mar 29, 2023Updated 2 years ago
- ☆15Oct 27, 2020Updated 5 years ago
- TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)☆73May 22, 2023Updated 2 years ago
- Code release for Hu et al., Language-Conditioned Graph Networks for Relational Reasoning. in ICCV, 2019☆92Aug 9, 2019Updated 6 years ago
- ☆16Dec 25, 2021Updated 4 years ago
- RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering☆10Nov 27, 2022Updated 3 years ago
- [ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents☆16Apr 4, 2024Updated last year
- Visual Navigation with Natural Multimodal Assistance (EMNLP 2019)☆29Jun 30, 2020Updated 5 years ago
- Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"☆187Apr 15, 2021Updated 4 years ago
- Vision-Language Pre-training for Image Captioning and Question Answering☆423Jan 18, 2022Updated 4 years ago
- MLPs for Vision and Langauge Modeling (Coming Soon)☆27Dec 9, 2021Updated 4 years ago
- Scene Text Aware Cross Modal Retrieval (StacMR)☆24Sep 3, 2021Updated 4 years ago
- Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…☆55Oct 30, 2024Updated last year
- Code and Data for ManyModalQA: Modality Disambiguation and QA over Diverse Inputs☆17Mar 2, 2020Updated 5 years ago
- Feature resources of "Diagnosing the Environment Bias in Vision-and-Language Navigation"☆16May 6, 2020Updated 5 years ago
- Counterfactual Samples Synthesizing for Robust VQA☆79Nov 24, 2022Updated 3 years ago
- Code for CVPR'18 "Grounding Referring Expressions in Images by Variational Context"☆30Jul 4, 2018Updated 7 years ago
- Torch Implementation of Speaker-Listener-Reinforcer for Referring Expression Generation and Comprehension☆34Mar 8, 2018Updated 7 years ago
- Code for ACL 2020 paper "Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA." Hyounghun Kim, Zineng T…☆34May 14, 2020Updated 5 years ago
- SOIT: Segmenting Objects with Instance-Aware Transformers☆14Jun 6, 2022Updated 3 years ago
- ☆16May 6, 2021Updated 4 years ago
- A length-controllable and non-autoregressive image captioning model.☆69Jun 10, 2021Updated 4 years ago
- Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)☆123Dec 17, 2022Updated 3 years ago
- Code release for Learning to Assemble Neural Module Tree Networks for Visual Grounding (ICCV 2019)☆39Nov 23, 2019Updated 6 years ago
- Implementation of the paper ''Implicit Feature Refinement for Instance Segmentation''.☆20Oct 27, 2021Updated 4 years ago
- Document Visual Question Answering☆131Jul 30, 2020Updated 5 years ago
- A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)☆5,615Jan 12, 2026Updated last month
- CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment☆22Apr 15, 2022Updated 3 years ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- Code for "bootstrap, review, decode: using out-of-domain textual data to improve image captioning"☆21Dec 26, 2016Updated 9 years ago
- Contrastive Learning for Image Captioning☆51Feb 22, 2018Updated 7 years ago
- Improving One-stage Visual Grounding by Recursive Sub-query Construction, ECCV 2020☆89Sep 30, 2021Updated 4 years ago
- Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"☆800Jun 30, 2021Updated 4 years ago
- Official Code for 'EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification' - NAACL 2022☆23May 9, 2022Updated 3 years ago