matakshay / Neural_Image_Caption_Generator
Deep Learning model which uses Computer Vision and NLP to generate captions for images
☆14Updated 3 years ago
Related projects: ⓘ
- ☆44Updated 3 years ago
- PyTorch code for EMNLP 2020 paper "X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers"☆50Updated 3 years ago
- 12-in-1: Multi-Task Vision and Language Representation Learning Web Demo☆35Updated last year
- ☆20Updated last year
- Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations☆14Updated 2 years ago
- COMIC: This is the code repo of our TMM2019 work titled "COMIC: Towards a Compact Image Captioning Model with Attention".☆15Updated 3 years ago
- A dataset of crowdsourced ratings for machine-generated image captions☆31Updated 5 years ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- Procedural Reasoning Networks☆7Updated 3 years ago
- Cross-modal Coherence Modeling for Caption Generation☆11Updated 4 years ago
- Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages☆9Updated last year
- opentqa is a open framework of the textbook question answering, which includes xtqa, mcan, cmr, mfb, mutan.☆11Updated 3 years ago
- CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations☆25Updated 10 months ago
- A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"☆78Updated 2 years ago
- Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))☆56Updated last year
- Transfer Learning via Unsupervised Task Discovery for Visual Question Answering☆32Updated 5 years ago
- ☆24Updated 3 years ago
- [EMNLP 2021] Code and data for our paper "Visually Grounded Reasoning across Languages and Cultures"☆28Updated 2 years ago
- Code for DiagNet: Bridging Text and Image☆10Updated 5 years ago
- Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"☆27Updated last year
- A collection of models for image<->text generation in ACM MM 2021.☆64Updated 2 years ago
- [CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning☆89Updated 5 months ago
- ☆45Updated last year
- Code for the paper: Saying No is An Art: Contextualized Fallback Responses for Unanswerable Dialogue Queries☆19Updated 2 years ago
- Implementation of "MULE: Multimodal Universal Language Embedding"☆15Updated 4 years ago
- Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch☆59Updated 3 years ago
- Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT…☆21Updated 3 years ago
- Identifying Visible Actions in Lifestyle Vlogs☆15Updated last year
- In-the-wild Question Answering☆15Updated last year
- Official Github Repo for the Findings of EMNLP 2021 paper "An animated picture says at least a thousand words: Selecting Gif-based Replie…☆32Updated 2 years ago