LeapLabTHU / ENAT
[NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
☆22Updated 5 months ago
Alternatives and similar repositories for ENAT
Users that are interested in ENAT are comparing it to the libraries listed below
Sorting:
- CODA: Repurposing Continuous VAEs for Discrete Tokenization☆17Updated last month
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆34Updated 8 months ago
- ☆16Updated 2 months ago
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆45Updated 11 months ago
- ☆13Updated 4 months ago
- [ECCV 2024] Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators☆44Updated 8 months ago
- Official repository of Uni-AdaFocus (TPAMI 2024).☆42Updated 4 months ago
- ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning☆31Updated last month
- ☆35Updated 4 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆36Updated 3 weeks ago
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks☆24Updated last year
- Autoregressive Image Generation with Randomized Parallel Decoding☆57Updated last month
- Official implementation of Dynamic Perceiver☆43Updated last year
- ☆15Updated 2 months ago
- ☆21Updated 3 months ago
- Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection☆24Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆70Updated 2 months ago
- Spatial-R1: The first MLLM trained using GRPO for spatial reasoning in videos☆33Updated this week
- ☆79Updated last month
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆49Updated 2 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆29Updated this week
- ☆25Updated last month
- Official PyTorch Code of ReKV (ICLR'25)☆17Updated 2 months ago
- A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆29Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆83Updated last month
- ☆25Updated last month
- [ICLR'25] Reconstructive Visual Instruction Tuning☆83Updated last month
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆31Updated 3 months ago
- [ICML 2024] SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning☆30Updated 7 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆40Updated last month