facebookresearch / mawsLinks

Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496

☆91

Alternatives and similar repositories for maws

Users that are interested in maws are comparing it to the libraries listed below

Sorting:

OliverRensu / D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…
☆98Updated last year
kirill-vish / Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
☆101Updated 10 months ago
UX-Decoder / FIND
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆125Updated 11 months ago
lucidrains / MaMMUT-pytorch
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch
☆103Updated last year
OliverRensu / DeepMIM
[WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling
☆53Updated 2 months ago
hammoudhasan / SynthCLIP
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆100Updated 4 months ago
facebookresearch / r-mae
PyTorch implementation of R-MAE https//arxiv.org/abs/2306.05411
☆113Updated 2 years ago
TomerRonen34 / mixed-resolution-vit
☆51Updated last year
LAION-AI / scaling-laws-openclip
Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)
☆171Updated last month
renwang435 / video-ttt-release
Test-Time Training on Video Streams
☆64Updated 2 years ago
enyac-group / supmae
This is a offical PyTorch/GPU implementation of SupMAE.
☆78Updated 2 years ago
ChenDelong1999 / subobjects
Official repository of paper "Subobject-level Image Tokenization" (ICML-25)
☆80Updated last month
facebookresearch / meru
Code for the paper "Hyperbolic Image-Text Representations", Desai et al, ICML 2023
☆174Updated last year
WalBouss / GEM
[CVPR24] Official Implementation of GEM (Grounding Everything Module)
☆127Updated 3 months ago
facebookresearch / long_seq_mae
code release of research paper "Exploring Long-Sequence Masked Autoencoders"
☆100Updated 2 years ago
QUVA-Lab / PIN
Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
☆26Updated 6 months ago
kaiyuyue / nxtp
[CVPR'24 Highlight] PyTorch Implementation of Object Recognition as Next Token Prediction
☆180Updated 3 months ago
mbanani / lgssl
[CVPR 2023] Learning Visual Representations via Language-Guided Sampling
☆149Updated 2 years ago
facebookresearch / MeMViT
Code Release for MeMViT Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition, CVPR 2022
☆148Updated 2 years ago
zhaoyue-zephyrus / AVION
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
☆132Updated last year
yuzhms / Streaming-Video-Model
[CVPR2023] Code for "Streaming Video Model"
☆79Updated 2 years ago
shashankvkt / DoRA_ICLR24
This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …
☆90Updated last year
TonyLianLong / CrossMAE
Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders
☆115Updated 3 months ago
Understanding-Visual-Datasets / VisDiff
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆120Updated last year
facebookresearch / CiT
Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".
☆78Updated 2 years ago
FocalNet / Networks-Beyond-Attention
A compilation of network architectures for vision and others without usage of self-attention mechanism
☆80Updated 2 years ago
facebookresearch / genecis
Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"
☆59Updated 2 years ago
TACJu / PartImageNet
Introduction and scripts for the paper "PartImageNet: A Large, High-Quality Dataset of Parts" (Ju He, Shuo Yang, Shaokang Yang, Adam Kort…
☆131Updated 4 months ago
Vibashan / Mask-free-OVIS
Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]
☆50Updated 6 months ago
facebookresearch / clip-rocket
Code release for "Improved baselines for vision-language pre-training"
☆60Updated last year