siddsriv / Image-captioningLinks
Using a CNN-LSTM hybrid network to generate captions for images
☆17Updated 5 years ago
Alternatives and similar repositories for Image-captioning
Users that are interested in Image-captioning are comparing it to the libraries listed below
Sorting:
- Labeled Movie Trailer Dataset☆16Updated 7 years ago
- Implemented 3 different architectures to tackle the Image Caption problem, i.e, Merged Encoder-Decoder - Bahdanau Attention - Transformer…☆40Updated 4 years ago
- CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering☆75Updated 5 years ago
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆20Updated 4 years ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)☆13Updated 2 years ago
- In-the-wild Question Answering☆15Updated 2 years ago
- ☆12Updated last year
- ☆17Updated 4 years ago
- A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention☆86Updated 5 years ago
- A Bert2Bert model which able to generate headlines!☆12Updated 4 years ago
- PyTorch implementation of NMT models along with custom tokenizers, models, and datasets☆20Updated 2 years ago
- Visual Question Answering in PyTorch with various Attention Models☆20Updated 5 years ago
- Deep Learning model which uses Computer Vision and NLP to generate captions for images☆14Updated 4 years ago
- Code for the paper 'Video Gesture Analysis for Autism Spectrum Disorder Detection', ICPR 2018☆20Updated 6 years ago
- This repository contains code and metadata of How2 dataset☆177Updated 5 months ago
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆45Updated last year
- Visual Question Answering in the Medical Domain VQA-Med 2019☆86Updated last year
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆34Updated 3 years ago
- generate captions for images using a CNN-RNN model that is trained on the Microsoft Common Objects in COntext (MS COCO) dataset☆79Updated 6 years ago
- Public repo for the paper: "Modeling Intensification for Sign Language Generation: A Computational Approach" by Mert Inan*, Yang Zhong*, …☆13Updated 3 years ago
- ☆59Updated 3 years ago
- Evaluation tools for image captioning. Including BLEU, ROUGE-L, CIDEr, METEOR, SPICE scores.☆29Updated 2 years ago
- An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER☆164Updated 2 years ago
- Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxi…☆60Updated last year
- Visualizing the learned space-time attention using Attention Rollout☆37Updated 3 years ago
- menovideo: pytorch library for video action recognition and video understanding☆29Updated 3 years ago
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆51Updated 3 years ago
- Code and dataset of "MEmoR: A Dataset for Multimodal Emotion Reasoning in Videos" in MM'20.☆54Updated last year
- Hate-CLIPper: Multimodal Hateful Meme Classification with Explicit Cross-modal Interaction of CLIP features - Accepted at EMNLP 2022 Work…☆52Updated last month