LargeWorldModel / ElasticTok
ElasticTok: Adaptive Tokenization for Image and Video
☆33Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for ElasticTok
- Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models☆78Updated 10 months ago
- ☆110Updated last year
- Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?☆78Updated 2 weeks ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆59Updated last month
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆55Updated last month
- ☆31Updated 3 weeks ago
- Official implementation of "Self-Improving Video Generation"☆52Updated last week
- ☆43Updated 2 months ago
- ☆44Updated 2 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆41Updated 3 weeks ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated last week
- ☆64Updated 4 months ago
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆20Updated this week
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆30Updated 4 months ago
- [arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization☆84Updated 5 months ago
- Official Release of NeurIPS 2023 Spotlight paper "Object-Centric Slot Diffusion"☆58Updated 8 months ago
- The codebase of our paper "Improving the Training of Rectified Flows"☆82Updated last month
- Minimal multi-gpu implementation of EDM2: "Analyzing and Improving the Training Dynamics of Diffusion Models"☆26Updated 8 months ago
- The official implementation of Diffusion-KTO: Aligning Diffusion Models by Optimizing Human Utility☆27Updated 3 weeks ago
- Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆23Updated 2 weeks ago
- [CVPR 2024] On the Content Bias in Fréchet Video Distance☆94Updated last month
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆42Updated 5 months ago
- [ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation …☆34Updated 3 weeks ago
- Code for the paper "GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos" published at CVPR 2024☆44Updated 8 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆76Updated last month
- VQVAE for video prediction☆26Updated 2 years ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆52Updated last year
- [ICML 2024] Compositional Image Decomposition with Diffusion Models☆40Updated 4 months ago
- Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223☆71Updated 3 weeks ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆23Updated last year