siddsriv / Image-captioningLinks
Using a CNN-LSTM hybrid network to generate captions for images
☆17Updated 5 years ago
Alternatives and similar repositories for Image-captioning
Users that are interested in Image-captioning are comparing it to the libraries listed below
Sorting:
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆20Updated 4 years ago
- Implemented 3 different architectures to tackle the Image Caption problem, i.e, Merged Encoder-Decoder - Bahdanau Attention - Transformer…☆40Updated 4 years ago
- Pytorch implementation of image captioning using transformer-based model.☆66Updated 2 years ago
- ☆17Updated 3 years ago
- ☆22Updated last year
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆35Updated 3 years ago
- Public repo for the paper: "Modeling Intensification for Sign Language Generation: A Computational Approach" by Mert Inan*, Yang Zhong*, …☆13Updated 3 years ago
- Code for the paper 'Video Gesture Analysis for Autism Spectrum Disorder Detection', ICPR 2018☆20Updated 6 years ago
- Image Captioning using CNN and Transformer.☆53Updated 3 years ago
- BERT + Image Captioning☆133Updated 4 years ago
- Deep Learning model which uses Computer Vision and NLP to generate captions for images☆14Updated 4 years ago
- Face-Cap: Image Captioning using Facial Expression Analysis☆16Updated 5 years ago
- In-the-wild Question Answering☆15Updated 2 years ago
- PyTorch implementation of NMT models along with custom tokenizers, models, and datasets☆20Updated 2 years ago
- Hate-CLIPper: Multimodal Hateful Meme Classification with Explicit Cross-modal Interaction of CLIP features - Accepted at EMNLP 2022 Work…☆52Updated 2 months ago
- ☆44Updated 3 years ago
- Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset☆18Updated 5 years ago
- A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention☆85Updated 5 years ago
- Used LSTM on Flickr dataset☆12Updated 7 years ago
- Visual Question Answering in the Medical Domain VQA-Med 2019☆87Updated last year
- Code for our Source-free Unsupervised Video Domain Adaptation Paper☆9Updated 5 months ago
- Reproduced code for Overcoming Label Noise for Source-free Unsupervised Video Domain Adaptation, ICVGIP'22☆22Updated last year
- Natural Language Processing☆28Updated last year
- [AAAI 2023 (Oral)] CrissCross: Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity☆25Updated last year
- Multimodal Meme Classification: Identifying Offensive Content in Image and Text☆70Updated 2 years ago
- Using LSTM or Transformer to solve Image Captioning in Pytorch☆78Updated 3 years ago
- ☆60Updated 4 years ago
- Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)☆14Updated 2 years ago
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆45Updated last year