Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
☆212Feb 13, 2026Updated 3 weeks ago
Alternatives and similar repositories for Scale-RAE
Users that are interested in Scale-RAE are comparing it to the libraries listed below
Sorting:
- ☆39Oct 29, 2025Updated 4 months ago
- ECCV2024, LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models☆18Aug 9, 2024Updated last year
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"☆138Oct 17, 2025Updated 4 months ago
- ☆12Jul 18, 2024Updated last year
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆175Feb 24, 2026Updated last week
- Official PyTorch implementation of paper “InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction”☆33Jul 28, 2025Updated 7 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆53May 8, 2025Updated 9 months ago
- Official PyTorch codes for "Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation", ECCV2024☆30Jul 19, 2024Updated last year
- [ICLR 2026] PixNerd: Pixel Neural Field Diffusion☆170Dec 10, 2025Updated 2 months ago
- This repo contains the code for 1D tokenizer and generator☆1,120Mar 20, 2025Updated 11 months ago
- Adapting Self-Supervised Representations as a Latent Space for Efficient Generation☆40Oct 17, 2025Updated 4 months ago
- [ECCV'24] A novel weakly supervised framework for 3D object detection from 2D bounding boxes. It can easily extend to novel scenarios and…☆36Jul 26, 2024Updated last year
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆433Feb 25, 2026Updated last week
- [ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think☆1,560Mar 16, 2025Updated 11 months ago
- SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing☆19Dec 28, 2024Updated last year
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis☆131May 16, 2025Updated 9 months ago
- [CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models☆1,409Dec 16, 2025Updated 2 months ago
- [Preprint] UCGM: Unified Continuous Generative Models☆182May 27, 2025Updated 9 months ago
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆108Dec 20, 2025Updated 2 months ago
- ☆49Feb 12, 2026Updated 3 weeks ago
- SEED-Voken: A Series of Powerful Visual Tokenizers☆997Nov 25, 2025Updated 3 months ago
- The code for "Toward Accurate and Temporally Consistent Video Restoration from Raw Data"☆16Dec 25, 2023Updated 2 years ago
- Official Implementation of pMF https://arxiv.org/abs/2601.22158☆178Feb 19, 2026Updated 2 weeks ago
- PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838☆1,872Feb 20, 2026Updated 2 weeks ago
- Make self forcing endless. Add cache purging. Add prompt controllability.☆69Sep 9, 2025Updated 5 months ago
- ☆10Sep 17, 2022Updated 3 years ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆17Nov 19, 2025Updated 3 months ago
- ☆17Nov 18, 2025Updated 3 months ago
- [CVPR 2026] 3D Motion Reconstruction for 4D Synthesis☆129Jan 28, 2026Updated last month
- [ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory☆417Jul 25, 2025Updated 7 months ago
- [🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s …☆648Feb 27, 2026Updated last week
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆76Jan 25, 2026Updated last month
- code for "TVG: A Training-free Transition Video Generation Method with Diffusion Models"☆48Aug 19, 2024Updated last year
- [ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling☆78Sep 28, 2025Updated 5 months ago
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"☆425Jun 20, 2025Updated 8 months ago
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.☆321Jul 9, 2024Updated last year
- FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation (ICCV 2023)☆24Sep 24, 2023Updated 2 years ago
- [Arxiv 2025] ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions☆45Jun 11, 2025Updated 8 months ago
- RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing☆58Dec 26, 2025Updated 2 months ago