OpenSparseLLMs / Skip-DiT
✈️ Accelerating Vision Diffusion Transformers with Skip Branches.
☆61Updated 3 months ago
Alternatives and similar repositories for Skip-DiT:
Users that are interested in Skip-DiT are comparing it to the libraries listed below
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆46Updated 2 months ago
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆44Updated last month
- ☆50Updated this week
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆98Updated 8 months ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 3 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆84Updated 2 months ago
- FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.☆40Updated 8 months ago
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆92Updated 4 months ago
- ☆57Updated last month
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 6 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆31Updated 3 months ago
- SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training☆177Updated last month
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆75Updated 2 weeks ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆42Updated 2 months ago
- ☆20Updated 3 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆86Updated 2 weeks ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆64Updated 4 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆119Updated 10 months ago
- [CVPR 2025] TinyFusion: Diffusion Transformers Learned Shallow☆86Updated 3 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆84Updated 5 months ago
- This is a repo to track the latest autoregressive visual generation papers.☆164Updated last week
- Code release for VTW (AAAI 2025) Oral☆32Updated 2 months ago
- 📚 Collection of awesome generation acceleration resources.☆177Updated last week
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆26Updated last month