ZitengWangNYU / Scale-RAEView external linksLinks
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
☆205Updated this week
Alternatives and similar repositories for Scale-RAE
Users that are interested in Scale-RAE are comparing it to the libraries listed below
Sorting:
- ☆34Oct 29, 2025Updated 3 months ago
- ECCV2024, LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models☆18Aug 9, 2024Updated last year
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"☆137Oct 17, 2025Updated 3 months ago
- ☆12Jul 18, 2024Updated last year
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆172Dec 17, 2025Updated last month
- Official PyTorch implementation of paper “InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction”☆33Jul 28, 2025Updated 6 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆53May 8, 2025Updated 9 months ago
- Official PyTorch codes for "Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation", ECCV2024☆30Jul 19, 2024Updated last year
- Official Implementation of pMF https://arxiv.org/abs/2601.22158☆141Updated this week
- [ICLR 2026] PixNerd: Pixel Neural Field Diffusion☆169Dec 10, 2025Updated 2 months ago
- This repo contains the code for 1D tokenizer and generator☆1,113Mar 20, 2025Updated 10 months ago
- Adapting Self-Supervised Representations as a Latent Space for Efficient Generation☆38Oct 17, 2025Updated 3 months ago
- [ECCV'24] A novel weakly supervised framework for 3D object detection from 2D bounding boxes. It can easily extend to novel scenarios and…☆36Jul 26, 2024Updated last year
- [ICLR 2026] UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models☆43Aug 4, 2025Updated 6 months ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆427Jan 7, 2026Updated last month
- SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing☆18Dec 28, 2024Updated last year
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis☆130May 16, 2025Updated 8 months ago
- [CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models☆1,388Dec 16, 2025Updated last month
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆107Dec 20, 2025Updated last month
- [Preprint] UCGM: Unified Continuous Generative Models☆180May 27, 2025Updated 8 months ago
- SEED-Voken: A Series of Powerful Visual Tokenizers☆993Nov 25, 2025Updated 2 months ago
- ☆48Updated this week
- ☆117Jan 28, 2026Updated 2 weeks ago
- [ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think☆1,544Mar 16, 2025Updated 10 months ago
- The code for "Toward Accurate and Temporally Consistent Video Restoration from Raw Data"☆16Dec 25, 2023Updated 2 years ago
- PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838☆1,859Sep 27, 2024Updated last year
- ☆17Nov 18, 2025Updated 2 months ago
- ☆10Sep 17, 2022Updated 3 years ago
- Make self forcing endless. Add cache purging. Add prompt controllability.☆69Sep 9, 2025Updated 5 months ago
- [🚀 ICLR 2026 Oral]NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s M…☆602Dec 25, 2025Updated last month
- [ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory☆414Jul 25, 2025Updated 6 months ago
- code for "TVG: A Training-free Transition Video Generation Method with Diffusion Models"☆48Aug 19, 2024Updated last year
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆75Jan 25, 2026Updated 2 weeks ago
- [ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling☆77Sep 28, 2025Updated 4 months ago
- RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing☆58Dec 26, 2025Updated last month
- [Arxiv 2025] ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions☆45Jun 11, 2025Updated 8 months ago
- FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation (ICCV 2023)☆24Sep 24, 2023Updated 2 years ago
- [NIPS 2025] FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens☆20Oct 12, 2025Updated 4 months ago
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder☆12Mar 11, 2025Updated 11 months ago