UCSC-VLAA / CRATE-alpha
This repository includes the official implementation our paper "Scaling White-Box Transformers for Vision"
☆45Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for CRATE-alpha
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆33Updated 4 months ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆31Updated last month
- Official code for ICLR 2024 paper Do Generated Data Always Help Contrastive Learning?☆28Updated 7 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated 6 months ago
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆91Updated last year
- This is a repo to track the latest autoregressive visual generation papers.☆41Updated 3 weeks ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆40Updated 4 months ago
- The collection of awesome papers on alignment of diffusion models.☆45Updated last week
- This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆62Updated 4 months ago
- ☆100Updated 7 months ago
- 🔥ImageFolder: Autoregressive Image Generation with Folded Tokens☆53Updated 3 weeks ago
- ☆48Updated 4 months ago
- The official implementation of "Adapter is All You Need for Tuning Visual Tasks".☆71Updated 2 months ago
- ☆103Updated 3 months ago
- Adapting LLaMA Decoder to Vision Transformer☆27Updated 5 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 2 months ago
- Official Implementation of "Denoising Diffusion Semantic Segmentation with Mask Prior Modeling"☆65Updated last year
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆69Updated 2 months ago
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆19Updated 3 weeks ago
- Code of our CVPR2024 paper - DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data☆41Updated 7 months ago
- ☆108Updated 5 months ago
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆87Updated 7 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆64Updated this week
- Official repository of paper "Subobject-level Image Tokenization"☆62Updated 6 months ago
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆65Updated 4 months ago
- ☆33Updated 3 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆77Updated 7 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆29Updated 5 months ago
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆35Updated this week