Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
☆242Feb 13, 2026Updated 2 months ago
Alternatives and similar repositories for Scale-RAE
Users that are interested in Scale-RAE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code and website for Self-Flow: Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis☆453Mar 15, 2026Updated last month
- ☆45Oct 29, 2025Updated 6 months ago
- ☆12Jul 18, 2024Updated last year
- ECCV2024, LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models☆18Aug 9, 2024Updated last year
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆186Feb 24, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis☆134May 16, 2025Updated 11 months ago
- RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing☆63Mar 19, 2026Updated last month
- [ICLR 2026] PixNerd: Pixel Neural Field Diffusion☆176Dec 10, 2025Updated 4 months ago
- [ICLR 2026] UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models☆48Aug 4, 2025Updated 9 months ago
- [ICML 2026] LatentMorph: Morphing Latent Reasoning into Image Generation☆41Updated this week
- This repo contains the code for 1D tokenizer and generator☆1,148Mar 20, 2025Updated last year
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 4 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆56May 8, 2025Updated 11 months ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆462Apr 16, 2026Updated 3 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆237Jan 22, 2026Updated 3 months ago
- [CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models☆1,456Dec 16, 2025Updated 4 months ago
- [ICML 2026] code & model for arxiv paper "Autoregressive Image Generation with Masked Bit Modeling"☆51Updated this week
- Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence☆340Apr 16, 2026Updated 3 weeks ago
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆317Oct 12, 2025Updated 6 months ago
- Official PyTorch codes for "Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation", ECCV2024☆31Jul 19, 2024Updated last year
- Recursive Visual Programming (ECCV 2024)☆18Nov 20, 2024Updated last year
- ☆15Nov 11, 2024Updated last year
- PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838☆1,905Feb 20, 2026Updated 2 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing☆19Dec 28, 2024Updated last year
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"☆139Oct 17, 2025Updated 6 months ago
- [ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think☆1,621Mar 16, 2025Updated last year
- Official PyTorch implementation of paper “InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction”☆33Apr 3, 2026Updated last month
- [ICLR 2026] Adapting Self-Supervised Representations as a Latent Space for Efficient Generation☆50Apr 24, 2026Updated last week
- SEED-Voken: A Series of Powerful Visual Tokenizers☆1,004Nov 25, 2025Updated 5 months ago
- [ECCV'24] A novel weakly supervised framework for 3D object detection from 2D bounding boxes. It can easily extend to novel scenarios and…☆36Jul 26, 2024Updated last year
- [Preprint] UCGM: Unified Continuous Generative Models☆184May 27, 2025Updated 11 months ago
- Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"☆1,870Feb 25, 2026Updated 2 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction☆402Jan 22, 2026Updated 3 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆540Apr 3, 2026Updated last month
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.☆323Jul 9, 2024Updated last year
- Official Implementation of the Paper:Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models (Siggraph 20…☆29Mar 29, 2026Updated last month
- Make self forcing endless. Add cache purging. Add prompt controllability.☆70Sep 9, 2025Updated 7 months ago
- code for "TVG: A Training-free Transition Video Generation Method with Diffusion Models"☆50Aug 19, 2024Updated last year
- JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers☆17Jul 21, 2025Updated 9 months ago