Sid2697 / Word-recognition-EmbedNet-CAB
Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"
☆21Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for Word-recognition-EmbedNet-CAB
- Code implementation for our DAS, 2020 paper titled "Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval"☆15Updated 3 months ago
- Code for "Weakly-supervised Fingerspelling Recognition in British Sign Language Videos", BMVC 2022.☆10Updated last year
- Labeled Movie Trailer Dataset☆16Updated 6 years ago
- [CVPR 2019] Pytorch code for Audio Visual Scene-Aware Dialog☆34Updated 3 years ago
- Official code for the paper "Visual Speech Enhancement Without A Real Visual Stream" published at WACV 2021☆103Updated 5 months ago
- 12-in-1: Multi-Task Vision and Language Representation Learning Web Demo☆35Updated last year
- Weakly-supervised action segmentation in video☆16Updated 2 years ago
- A unified framework to jointly model images, text, and human attention traces.☆78Updated 3 years ago
- Shapley values for assessing the importance of each frame in a video☆17Updated 3 years ago
- EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset☆52Updated 3 years ago
- PyTorch implementation of DRAW: A Recurrent Neural Network For Image Generation trained on Devanagari dataset.☆89Updated 4 years ago
- Collection of useful FFMPEG commands for processing audio and video files.☆44Updated 5 years ago
- a repository containing the details of natural language inference dataset in Hindi☆11Updated 3 years ago
- Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch☆59Updated 3 years ago
- AViD Dataset: Anonymized Videos from Diverse Countries☆56Updated last year
- Tooling to play around with multilingual machine translation for Indian Languages.☆21Updated 2 years ago
- "LipNet: End-to-End Sentence-level Lipreading" in PyTorch☆65Updated 5 years ago
- I3D implemetation in Keras + video preprocessing + visualization of results☆42Updated last year
- Unofficial Implementation of Google Deepmind's paper `Objects that Sound`☆83Updated 6 years ago
- Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19☆33Updated 5 years ago
- CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering☆74Updated 4 years ago
- Identifying Visible Actions in Lifestyle Vlogs☆15Updated last year
- Implementations of Transformers for Video☆24Updated 3 years ago
- ☆23Updated 3 years ago
- Audio Visual Instance Discrimination with Cross-Modal Agreement☆127Updated 3 years ago
- ☆37Updated 2 years ago
- LipNet with gluon☆22Updated 2 years ago
- Text to Speech for Indic languages☆48Updated 2 years ago
- ☆24Updated 5 years ago
- Code to train and evaluate the GeNeVA-GAN model for the GeNeVA task proposed in our ICCV 2019 paper "Tell, Draw, and Repeat: Generating a…☆85Updated 2 years ago