siddsriv / Image-captioningLinks
Using a CNN-LSTM hybrid network to generate captions for images
☆18Updated 6 years ago
Alternatives and similar repositories for Image-captioning
Users that are interested in Image-captioning are comparing it to the libraries listed below
Sorting:
- Repo has PyTorch implementation "Attention is All you Need - Transformers" paper for Machine Translation from French queries to English.☆70Updated 5 years ago
- Implemented 3 different architectures to tackle the Image Caption problem, i.e, Merged Encoder-Decoder - Bahdanau Attention - Transformer…☆40Updated 4 years ago
- 1st Place Public Leaderboard Solution for ERC2019☆70Updated 5 years ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 4 years ago
- ☆17Updated 4 years ago
- ☆44Updated 4 years ago
- Code release for ICCV 2021 paper "Anticipative Video Transformer"☆155Updated 3 years ago
- Labeled Movie Trailer Dataset☆16Updated 7 years ago
- AIMS 2020, class on Visual Recognition☆23Updated 5 years ago
- Visualizing the learned space-time attention using Attention Rollout☆39Updated 3 years ago
- PyTorch implementation of NMT models along with custom tokenizers, models, and datasets☆20Updated 3 years ago
- menovideo: pytorch library for video action recognition and video understanding☆29Updated 4 years ago
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆37Updated 3 years ago
- Implementations of various GAN architectures using PyTorch Lightning☆26Updated 5 years ago
- Contains additional materials for two keras.io blog posts.☆17Updated 4 years ago
- 12-in-1: Multi-Task Vision and Language Representation Learning Web Demo☆35Updated 2 years ago
- An education step by step implementation of SimCLR that accompanies the blogpost☆31Updated 3 years ago
- Easiest way of fine-tuning HuggingFace video classification models☆146Updated 2 years ago
- The Transformer in PyTorch☆13Updated last year
- A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"☆83Updated 3 years ago
- Recurrent neural networks: building a custom LSTM/GRU cell in PyTorch☆28Updated 5 years ago
- Implementation of the Benchmark Approaches for Medical Instructional Video Classification (MedVidCL) and Medical Video Question Answering…☆30Updated 2 years ago
- A summarization of Transformer-based architectures for CV tasks, including image classification, object detection, segmentation, and Few-…☆115Updated 3 years ago
- This repository shows how to implement a basic model for multimodal entailment.☆10Updated 4 years ago
- Code for the paper 'Video Gesture Analysis for Autism Spectrum Disorder Detection', ICPR 2018☆24Updated 6 years ago
- Deep Learning model which uses Computer Vision and NLP to generate captions for images☆15Updated 5 years ago
- ☆65Updated 3 years ago
- Implementation of modern data augmentation techniques in TensorFlow 2.x to be used in your training pipeline.☆34Updated 5 years ago
- Toloka Visual Question Answering Challenge at WSDM Cup 2023☆31Updated last year
- The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-languag…☆231Updated 3 years ago