Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
☆232Feb 13, 2026Updated 2 months ago
Alternatives and similar repositories for Scale-RAE
Users that are interested in Scale-RAE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code and website for Self-Flow: Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis☆428Mar 15, 2026Updated last month
- ☆44Oct 29, 2025Updated 5 months ago
- ☆12Jul 18, 2024Updated last year
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆181Feb 24, 2026Updated last month
- ECCV2024, LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models☆18Aug 9, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis☆133May 16, 2025Updated 11 months ago
- RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing☆61Mar 19, 2026Updated 3 weeks ago
- [ICLR 2026] PixNerd: Pixel Neural Field Diffusion☆175Dec 10, 2025Updated 4 months ago
- [ICLR 2026] UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models☆45Aug 4, 2025Updated 8 months ago
- LatentMorph: Morphing Latent Reasoning into Image Generation☆39Mar 29, 2026Updated 2 weeks ago
- code & model for arxiv paper "Autoregressive Image Generation with Masked Bit Modeling"☆44Apr 8, 2026Updated last week
- This repo contains the code for 1D tokenizer and generator☆1,140Mar 20, 2025Updated last year
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆55May 8, 2025Updated 11 months ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 4 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆456Mar 25, 2026Updated 3 weeks ago
- Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence☆326Apr 9, 2026Updated last week
- [CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models☆1,438Dec 16, 2025Updated 4 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆236Jan 22, 2026Updated 2 months ago
- Official PyTorch codes for "Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation", ECCV2024☆31Jul 19, 2024Updated last year
- Recursive Visual Programming (ECCV 2024)☆18Nov 20, 2024Updated last year
- ☆15Nov 11, 2024Updated last year
- PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838☆1,898Feb 20, 2026Updated last month
- SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing☆19Dec 28, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think☆1,601Mar 16, 2025Updated last year
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"☆139Oct 17, 2025Updated 5 months ago
- Adapting Self-Supervised Representations as a Latent Space for Efficient Generation☆40Oct 17, 2025Updated 5 months ago
- Official PyTorch implementation of paper “InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction”☆33Apr 3, 2026Updated last week
- SEED-Voken: A Series of Powerful Visual Tokenizers☆1,002Nov 25, 2025Updated 4 months ago
- [ECCV'24] A novel weakly supervised framework for 3D object detection from 2D bounding boxes. It can easily extend to novel scenarios and…☆36Jul 26, 2024Updated last year
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆316Oct 12, 2025Updated 6 months ago
- [Preprint] UCGM: Unified Continuous Generative Models☆183May 27, 2025Updated 10 months ago
- Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction☆367Jan 22, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Cambrian-S: Towards Spatial Supersensing in Video☆537Apr 3, 2026Updated last week
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.