qiy20 / SAIMLinks
Official PyTorch Implementation of Exploring Stochastic Autoregressive Image Modeling for Visual Representation, Accepted by AAAI 2023.
☆15Updated 2 years ago
Alternatives and similar repositories for SAIM
Users that are interested in SAIM are comparing it to the libraries listed below
Sorting:
- [ICCV2023] DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models☆178Updated last year
- (CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modeling☆195Updated 11 months ago
- Official Code of Paper "Reversible Column Networks" "RevColv2"☆263Updated last year
- Project Page for "Multi-Task Dense Prediction via Mixture of Low-Rank Experts"☆82Updated last month
- [NeurIPS 2023] Masked Image Residual Learning for Scaling Deeper Vision Transformers☆19Updated last year
- ☆91Updated 2 years ago
- [CVPR 2023] Official repository of Generative Semantic Segmentation☆217Updated last year
- ☆134Updated last year
- [CVPR 2023] Explicit Visual Prompting for Low-Level Structure Segmentations☆209Updated last year
- ☆16Updated 8 months ago
- This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"☆114Updated last year
- [CVPR 2023] CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation☆198Updated 10 months ago
- Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference☆160Updated 9 months ago
- The official implementation of [CVPR 2025] "5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks".☆343Updated last month
- The official implementation of CMAE https://arxiv.org/abs/2207.13532 and https://ieeexplore.ieee.org/document/10330745☆107Updated last year
- ☆58Updated 11 months ago
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…☆108Updated this week
- HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model☆54Updated last week
- ☆32Updated last year
- ☆260Updated 2 years ago
- MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning☆143Updated 2 years ago
- ☆67Updated 2 years ago
- The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"☆217Updated last year
- [ICLR 2023] Masked Frequency Modeling for Self-Supervised Visual Pre-Training☆76Updated 2 years ago
- Text-Image Alignment for Diffusion-based Perception (TADP) - CVPR 2024☆36Updated 10 months ago
- This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"☆198Updated 2 years ago
- [ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction☆190Updated last year
- [ICCV2023] This is an official implementation for "Scale-Aware Modulation Meet Transformer".☆210Updated last year
- [NeurIPS2024 Spotlight] The official implementation of MambaTree: Tree Topology is All You Need in State Space Model☆97Updated last year
- ☆87Updated last year