facebookresearch / GDT

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.
44Updated 3 years ago

Related projects: