microsoft / ExtreMA
A self-supervised learning approach based on extremely large masking
☆29Updated 2 years ago
Alternatives and similar repositories for ExtreMA:
Users that are interested in ExtreMA are comparing it to the libraries listed below
- [NeurIPS 2022] code for "K-LITE: Learning Transferable Visual Models with External Knowledge" https://arxiv.org/abs/2204.09222☆51Updated last year
- A pytorch implementation of the ICCV2021 workshop paper SimDis: Simple Distillation Baselines for Improving Small Self-supervised Models☆14Updated 3 years ago
- [ICCV 2023] Label-Efficient Online Continual Object Detection in Streaming Video☆17Updated last year
- A Unified Framework for Video-Language Understanding☆56Updated last year
- ☆47Updated last year
- code release of research paper "Exploring Long-Sequence Masked Autoencoders"☆99Updated 2 years ago
- a novel data augmentation method across data modalities☆73Updated last year
- ☆44Updated 3 years ago
- [CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"☆52Updated last year
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year
- Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)☆13Updated 2 years ago
- ☆18Updated 2 years ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆24Updated 11 months ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆18Updated 2 years ago
- ☆57Updated last year
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆31Updated last year
- ScaleNet: Searching for the Model to Scale (ECCV 2022)☆12Updated 2 years ago
- Example code for OCDA-Driving☆15Updated 4 years ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Updated last year
- [ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning☆64Updated 2 years ago
- Compress conventional Vision-Language Pre-training data☆49Updated last year
- On-Device Domain Generalization☆41Updated 2 years ago
- Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency☆17Updated 3 years ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆28Updated 8 months ago
- Code for Point-Level Regin Contrast (https//arxiv.org/abs/2202.04639)☆34Updated 2 years ago
- Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"☆59Updated last year
- [Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training☆21Updated 2 years ago
- ☆57Updated last year
- ☆52Updated last year