LeapLabTHU / ENAT
[NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
☆22Updated 2 months ago
Alternatives and similar repositories for ENAT:
Users that are interested in ENAT are comparing it to the libraries listed below
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆33Updated 5 months ago
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆42Updated 8 months ago
- ☆16Updated 3 months ago
- ☆13Updated 2 months ago
- [ECCV 2024] Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators☆43Updated 5 months ago
- Official repository of Uni-AdaFocus (TPAMI 2024).☆39Updated 2 months ago
- Official implementation of Dynamic Perceiver☆42Updated last year
- ☆29Updated last month
- Official repository of InLine attention (NeurIPS 2024)☆39Updated last month
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks☆24Updated last year
- Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection☆13Updated last month
- [ICML 2024] SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning☆27Updated 4 months ago
- [IEEE TPAMI] Latency-aware Unified Dynamic Networks for Efficient Image Recognition☆46Updated 9 months ago
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆75Updated 3 weeks ago
- ☆17Updated last month
- Liquid: Language Models are Scalable Multi-modal Generators☆65Updated 2 months ago
- Open implementation of "RandAR"☆54Updated last month
- ☆53Updated 3 weeks ago
- PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆25Updated 2 months ago
- Jittor implementation of Vision Transformer with Deformable Attention☆30Updated 2 years ago
- ☆23Updated 4 months ago
- ☆36Updated 2 years ago
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆45Updated last month
- A collection of vision foundation models unifying understanding and generation.☆40Updated last month
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆51Updated this week
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆50Updated last week