Sid2697 / Word-recognition-EmbedNet-CAB
Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"
☆21Updated 3 years ago
Related projects: ⓘ
- Code implementation for our DAS, 2020 paper titled "Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval"☆14Updated last month
- Code for "Weakly-supervised Fingerspelling Recognition in British Sign Language Videos", BMVC 2022.☆10Updated last year
- Labeled Movie Trailer Dataset☆15Updated 6 years ago
- An easy-to-use app to visualise attentions of various VQA models.☆40Updated last year
- 12-in-1: Multi-Task Vision and Language Representation Learning Web Demo☆35Updated last year
- Unofficial Implementation of Google Deepmind's paper `Objects that Sound`☆83Updated 6 years ago
- EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset☆52Updated 3 years ago
- ☆37Updated 6 years ago
- ☆44Updated 3 years ago
- Tooling to play around with multilingual machine translation for Indian Languages.☆21Updated 2 years ago
- PyTorch implementation of DRAW: A Recurrent Neural Network For Image Generation trained on Devanagari dataset.☆89Updated 4 years ago
- A unified framework to jointly model images, text, and human attention traces.☆78Updated 3 years ago
- menovideo: pytorch library for video action recognition and video understanding☆28Updated 2 years ago
- An implementation of the paper "Contextualize, Show and Tell: A Neural Visual Storyteller." presented at the Storytelling Workshop, co-lo…☆33Updated 5 years ago
- ☆18Updated 2 years ago
- LipNet with gluon☆22Updated last year
- a repository containing the details of natural language inference dataset in Hindi☆11Updated 3 years ago
- Model submitted for the ICMI 2018 EmotiW Group-Level Emotion Recognition Challenge☆79Updated 5 years ago
- Text to Speech for Indic languages☆49Updated 2 years ago
- Official code for the paper "Visual Speech Enhancement Without A Real Visual Stream" published at WACV 2021☆102Updated 3 months ago
- A neural network architecture(CNN+LSTM) that automatically generates captions from the images. The model uses ResNet architecture to trai…☆25Updated 4 years ago
- "LipNet: End-to-End Sentence-level Lipreading" in PyTorch☆64Updated 5 years ago
- AViD Dataset: Anonymized Videos from Diverse Countries☆55Updated last year
- Speeech Recognition for Indic languages.☆11Updated 3 years ago
- Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19☆33Updated 5 years ago
- A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"☆78Updated 2 years ago
- Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages☆9Updated last year
- Shapley values for assessing the importance of each frame in a video☆17Updated 3 years ago
- ☆28Updated 4 years ago
- Implementations of Transformers for Video☆24Updated 3 years ago