We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.
☆47Aug 29, 2021Updated 4 years ago
Alternatives and similar repositories for GDT
Users that are interested in GDT are comparing it to the libraries listed below
Sorting:
- a pytorch implementation for MoCo V3☆32Apr 14, 2021Updated 4 years ago
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- Code for generating a single image pretraining dataset☆13Aug 3, 2021Updated 4 years ago
- This repo covers the implementation for Labelling unlabelled videos from scratch with multi-modal self-supervision, which learns clusters…☆117Apr 26, 2021Updated 4 years ago
- Official repository for the paper "Question Answering Infused Pre-training of General-Purpose Contextualized Representations" by Robin Ji…☆15Aug 13, 2021Updated 4 years ago
- ☆13Jul 13, 2018Updated 7 years ago
- StarNet: Targeted Computation for Object Detection in Point Clouds☆14Jan 28, 2020Updated 6 years ago
- Code and data for paper "(How) do Language Models Track State?"☆20Mar 31, 2025Updated 11 months ago
- An unofficial PyTorch implementation of MixMatch - A Holistic Approach to Semi-Supervised Learning☆14Aug 10, 2021Updated 4 years ago
- Localizing Visual Sounds the Hard Way☆82Jul 6, 2022Updated 3 years ago
- ☆15Dec 11, 2021Updated 4 years ago
- ☆72Apr 19, 2020Updated 5 years ago
- [EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…☆20Jan 17, 2022Updated 4 years ago
- This repo contains the code for the recipe of the winning entry to the Ego4d VQ2D challenge at CVPR 2022.☆41Mar 7, 2023Updated 2 years ago
- Codes for ACMMM 2021 paper "Fully Quantized Image Super-Resolution Networks".☆19Jul 25, 2021Updated 4 years ago
- Official code of "Discover the Unknown Biased Attribute of an Image Classifier" (ICCV 2021)☆21Oct 11, 2021Updated 4 years ago
- official code of CVPR'18 paper "learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks"☆43Feb 14, 2019Updated 7 years ago
- ☆22Feb 25, 2020Updated 6 years ago
- Official pytorch implementation of I2I translation with low resolution conditioning☆23Sep 2, 2021Updated 4 years ago
- Anonymize Faces for Privacy Preserving☆26Apr 18, 2019Updated 6 years ago
- ☆24Feb 24, 2021Updated 5 years ago
- This is a repository for my work on the paper "Oracle Guided Image Synthesis with Relative Queries".☆24May 6, 2022Updated 3 years ago
- roi_align_rotate, roi, nms, rotate, pytorch☆19Apr 14, 2019Updated 6 years ago
- "Near, far: Patch-ordering enhances vision foundation models' scene understanding": A New SSL Post-Training Approach for Improving DINOv2…☆30Apr 20, 2025Updated 10 months ago
- Implementation of “Video Deblurring by Fitting to Test Data“: https://arxiv.org/abs/2012.05228☆26Dec 11, 2020Updated 5 years ago
- Code for Visual Sound Localization in the Wild by Cross-Modal Interference Erasing (AAAI 2022).☆29Feb 15, 2022Updated 4 years ago
- Official implementation of paper "ScatSimCLR: self-supervised contrastive learning with pretext task regularization for small-scale datas…☆26Sep 7, 2021Updated 4 years ago
- Official implementation of "Towards Distribution-Agnostic Generalized Category Discovery" (NIPS 2023)☆26Oct 21, 2023Updated 2 years ago
- Visual Correspondence Hallucination: Towards Geometric Reasoning (Under Review)☆29Jan 28, 2023Updated 3 years ago
- ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions (SIGGRAPH 2022 - Journal Track)☆112May 25, 2022Updated 3 years ago
- [NeurIPS 2022 Spotlight] Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator☆30Oct 3, 2022Updated 3 years ago
- CVPR 2022 VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations☆129Oct 20, 2022Updated 3 years ago
- ☆34Jun 12, 2024Updated last year
- ☆32Mar 23, 2024Updated last year
- CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training☆32Jul 20, 2022Updated 3 years ago
- An open source implementation of CLIP.☆33Nov 7, 2022Updated 3 years ago
- The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.☆31Jan 29, 2024Updated 2 years ago
- Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)☆222Aug 23, 2022Updated 3 years ago
- The PASS dataset: pretrained models and how to get the data☆267Jun 5, 2022Updated 3 years ago