TonyLianLong/CrossMAE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TonyLianLong/CrossMAE)

TonyLianLong / CrossMAE

Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders

☆135

Alternatives and similar repositories for CrossMAE

Users that are interested in CrossMAE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
naver-ai / lut
View on GitHub
[ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"
☆14Dec 1, 2024Updated last year
ml-jku / MIM-Refiner
View on GitHub
A Contrastive Learning Boost from Intermediate Pre-Trained Representations
☆44Sep 19, 2024Updated last year
swimmiing / ACL-SSL
View on GitHub
Repository of the IJCV'26 & WACV'24 paper
☆34Apr 27, 2026Updated 2 months ago
FengheTan9 / HySparK
View on GitHub
[MICCAI 2024] HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training
☆22Nov 17, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
para-lost / ECHO
View on GitHub
Echo: "Constantly Improving Image Models Need Constantly Improving Benchmarks" (ICLR 2026)
☆20Jan 29, 2026Updated 5 months ago
TonyLianLong / igligen
View on GitHub
Improved Implementation for Training GLIGEN: Open-Set Grounded Text-to-Image Generation
☆46Jun 1, 2024Updated 2 years ago
see-say-segment / sesame
View on GitHub
🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
☆47Jun 16, 2024Updated 2 years ago
stoneMo / CIGN
View on GitHub
Official implementation for CIGN
☆17Sep 11, 2023Updated 2 years ago
huiwon-jang / RSP
View on GitHub
Visual Representation Learning with Stochastic Frame Prediction (ICML 2024)
☆28Nov 27, 2024Updated last year
dynamic-lm / interrupt-lrm
View on GitHub
🔥 [ICML 2026] Official implementation of "Are LRMs Interruptible?"
☆18Jun 18, 2026Updated last month
takerum / meta_sequential_prediction
View on GitHub
☆18May 25, 2023Updated 3 years ago
RAIVNLab / MIMIC
View on GitHub
MIMIC: Masked Image Modeling with Image Correspondences
☆16Jun 14, 2024Updated 2 years ago
EdisonLeeeee / Awesome-Masked-Autoencoders
View on GitHub
A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).
☆868Jul 10, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ariesssxu / vta-ldm
View on GitHub
☆61Jun 15, 2025Updated last year
hxixixh / mix-and-localize
View on GitHub
☆23Mar 20, 2024Updated 2 years ago
visgym / VisGym
View on GitHub
Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
☆114May 3, 2026Updated 2 months ago
hananshafi / MedContext
View on GitHub
[MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"
☆14Nov 1, 2024Updated last year
TonyLianLong / RCF-UnsupVideoSeg
View on GitHub
[CVPR 2023] Segmenting objects in videos without human annotations 🤯: Official implementation for Bootstrapping Objectness from Videos b…
☆40Nov 23, 2023Updated 2 years ago
RayYoh / Hammer
View on GitHub
[CVPR 2026] Implementation of HAMMER: Harnessing MLLMs via Cross-Modal Integration for Intention-Driven 3D Affordance Grounding
☆20Apr 30, 2026Updated 2 months ago
bwconrad / can
View on GitHub
PyTorch reimplementation of "A simple, efficient and scalable contrastive masked autoencoder for learning visual representations".
☆39Jan 10, 2023Updated 3 years ago
zhenyuw16 / CompAgent_code
View on GitHub
Code release for our paper "Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation".
☆18Jan 30, 2024Updated 2 years ago
ml-jku / MAE-CT
View on GitHub
☆33Apr 4, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
hengRUC / VSP
View on GitHub
☆24Sep 24, 2023Updated 2 years ago
techmonsterwang / iLLaMA
View on GitHub
Adapting LLaMA Decoder to Vision Transformer
☆30May 20, 2024Updated 2 years ago
aimagelab / MaPeT
View on GitHub
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
☆16Jul 1, 2025Updated last year
yannqi / COMBO-AVS
View on GitHub
[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…
☆40Apr 20, 2025Updated last year
stoneMo / AVGN
View on GitHub
Official implementation for AVGN
☆42Mar 24, 2023Updated 3 years ago
Lupin1998 / Awesome-MIM
View on GitHub
[Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)
☆354Apr 23, 2025Updated last year
ku-vai / TPoS
View on GitHub
This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)
☆25Dec 7, 2023Updated 2 years ago
ZhichengHuang / CMAE
View on GitHub
The official implementation of CMAE https://arxiv.org/abs/2207.13532 and https://ieeexplore.ieee.org/document/10330745
☆121Jan 27, 2024Updated 2 years ago
casper9429-kth / Siamese-Masked-Autoencoders---Learning-and-Exploration
View on GitHub
Course: DD2412 Deep Learning Advanced at KTH Project by Casper, Magnus, and Friso Focus: Self-supervised learning and computer vision wit…
☆12Dec 15, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
TonyLianLong / LLM-groundedVideoDiffusion
View on GitHub
[ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper
☆172May 7, 2024Updated 2 years ago
kaist-ami / SoundBrush
View on GitHub
☆14Dec 8, 2025Updated 7 months ago
facebookresearch / maws
View on GitHub
Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496
☆93Mar 24, 2026Updated 4 months ago
zhaoyizhou1123 / mbrcsl
View on GitHub
☆11Nov 18, 2023Updated 2 years ago
yahooo-m / VOS-Solution
View on GitHub
ECCV 2024 STMA & CVPR 2024 1st MOSE & 1st VOT Challenge & 1st LSVOS v6
☆12Oct 16, 2024Updated last year
WikiChao / Ego-AV-Loc
View on GitHub
[CVPR 2023] Egocentric Audio-Visual Object Localization
☆27Jan 6, 2024Updated 2 years ago
kdexd / coco-rem
View on GitHub
Code for the paper "Benchmarking Object Detectors with COCO: A New Path Forward."
☆36Jul 13, 2024Updated 2 years ago