ThisisBillhe / ZipAR
This is the official PyTorch implementation of "ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality"
☆35Updated last week
Alternatives and similar repositories for ZipAR:
Users that are interested in ZipAR are comparing it to the libraries listed below
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆73Updated 2 weeks ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆82Updated 5 months ago
- This is a repo to track the latest autoregressive visual generation papers.☆71Updated last week
- Officail Repo of γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆25Updated last month
- 📚 Collection of awesome generation acceleration resources.☆63Updated this week
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆31Updated 2 weeks ago
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Pekin…☆61Updated last month
- Adapting LLaMA Decoder to Vision Transformer☆27Updated 6 months ago
- ☆39Updated this week
- ☆112Updated 2 months ago
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆51Updated 3 months ago
- Accelerating Diffusion Transformers with Token-wise Feature Caching☆31Updated last month
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆46Updated last month
- Liquid: Language Models are Scalable Multi-modal Generators☆23Updated this week
- a collection of awesome autoregressive visual generation models☆51Updated this week
- Code release for VTW (AAAI 2025)☆27Updated last week
- TinyFusion: Diffusion Transformers Learned Shallow☆67Updated last week
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆84Updated 5 months ago
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆21Updated last month
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆32Updated 3 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆103Updated 7 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆57Updated last month
- Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆26Updated last month
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆164Updated this week
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆50Updated last month
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆78Updated 2 months ago
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆29Updated 2 weeks ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆41Updated 2 months ago
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆14Updated 4 months ago