OpenSparseLLMs / Skip-DiT
✈️ Accelerating Vision Diffusion Transformers with Skip Branches.
☆58Updated 3 weeks ago
Alternatives and similar repositories for Skip-DiT:
Users that are interested in Skip-DiT are comparing it to the libraries listed below
- Open-Pandora: On-the-fly Control Video Generation☆31Updated last month
- [EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Infe…☆86Updated 2 months ago
- 🚀LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆63Updated last month
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆51Updated 4 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆50Updated this week
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆109Updated 7 months ago
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆43Updated last week
- ☆92Updated 6 months ago
- ☆33Updated last week
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆56Updated 3 months ago
- ☆43Updated 3 weeks ago
- CLIP-MoE: Mixture of Experts for CLIP☆23Updated 3 months ago
- DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆17Updated last month
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆55Updated 2 months ago
- Code release for VTW (AAAI 2025)☆27Updated last month
- Official implement of MIA-DPO☆48Updated 2 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆61Updated last month
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆80Updated 2 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆82Updated 5 months ago
- ☆32Updated 5 months ago
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆88Updated last month
- This is a repo to track the latest autoregressive visual generation papers.☆96Updated last week
- PyTorch implementation of StableMask (ICML'24)☆12Updated 6 months ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆88Updated 5 months ago
- ☆31Updated last month
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Pekin…☆66Updated 2 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆56Updated 3 months ago
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆215Updated last week
- A Self-Training Framework for Vision-Language Reasoning☆57Updated last month
- Codes for Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM☆46Updated 3 months ago