ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
☆57Jun 13, 2023Updated 3 years ago
Alternatives and similar repositories for rosita
Users that are interested in rosita are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning☆25Sep 4, 2020Updated 5 years ago
- A PyTorch reimplementation of bottom-up-attention models☆301Apr 7, 2022Updated 4 years ago
- ☆27Oct 7, 2021Updated 4 years ago
- A reading list of papers about Visual Question Answering.☆35Aug 17, 2022Updated 3 years ago
- [Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph☆72Feb 9, 2024Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆10Jul 23, 2021Updated 4 years ago
- Code for ViLBERTScore in EMNLP Eval4NLP☆18Oct 27, 2022Updated 3 years ago
- Deep Multimodal Neural Architecture Search☆29Nov 15, 2020Updated 5 years ago
- Official code repo for "ProTo: program-guided Transformers for Program-guided Tasks☆21Apr 15, 2022Updated 4 years ago
- Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".☆279Jun 14, 2025Updated last year
- The good practice in the VQA system such as pos-tag attention, structed triplet learning and triplet attention is very general and can be…☆19Jan 23, 2018Updated 8 years ago
- Grid features pre-training code for visual question answering☆269Sep 17, 2021Updated 4 years ago
- project page for VinVL☆360Jul 26, 2023Updated 2 years ago
- Deep Modular Co-Attention Networks for Visual Question Answering☆458Dec 16, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT☆72Nov 14, 2022Updated 3 years ago
- Implementation of Mutan+ArticleNet on OKVQA☆10Jan 11, 2021Updated 5 years ago
- Code for Knowledge-Embedded Routing Network for Scene Graph Generation (CVPR 2019)☆122Aug 17, 2022Updated 3 years ago
- Media Intelligence Laboratory Machine Learning / Deep Learning Summer School☆17Oct 1, 2019Updated 6 years ago
- A collection of models for image<->text generation in ACM MM 2021.☆67Oct 31, 2021Updated 4 years ago
- ☆22Aug 10, 2020Updated 5 years ago
- Oscar and VinVL☆1,053Aug 28, 2023Updated 2 years ago
- The official code for "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" (IEEE Access, 2021…☆18Oct 21, 2022Updated 3 years ago
- Code for paper: Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection. Project page: http://www.juanrojas.net/v…☆19Mar 3, 2021Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"☆43May 13, 2021Updated 5 years ago
- [ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383☆419Oct 28, 2022Updated 3 years ago
- implement gat with batch☆10Nov 28, 2020Updated 5 years ago
- This is the code of ECCV 2022 (Oral) paper "Fine-Grained Scene Graph Generation with Data Transfer".☆103Jan 24, 2023Updated 3 years ago
- Weakly Supervised Grounding for VQA in Vision-Language Transformers☆16May 6, 2023Updated 3 years ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- METER: A Multimodal End-to-end TransformER Framework☆377Nov 16, 2022Updated 3 years ago
- ☆15May 10, 2021Updated 5 years ago
- ☆18Jun 10, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code accompanying the paper "Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs" (Chen et al., …☆200Dec 1, 2022Updated 3 years ago
- [CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning☆208Sep 30, 2022Updated 3 years ago
- Implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering☆31Apr 30, 2024Updated 2 years ago
- Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))☆56Feb 6, 2023Updated 3 years ago
- A length-controllable and non-autoregressive image captioning model.☆69Jun 10, 2021Updated 5 years ago
- ☆25Jun 25, 2021Updated 4 years ago
- Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)☆26Jul 16, 2025Updated 11 months ago