Hassaan-Elahi / Dense-Captioning

Dense Captioning is a system of fully localized Deep Convolutional Neural networks to translate a video into natural language. It uses CNN (VGG16) for feature extraction from video and encoder-decoder models (LSTM & GRU) to generate descriptions utilizing transfer-learning approach.
11Updated 6 years ago

Alternatives and similar repositories for Dense-Captioning:

Users that are interested in Dense-Captioning are comparing it to the libraries listed below