OpenSparseLLMs / Skip-DiT
✈️ Accelerating Vision Diffusion Transformers with Skip Branches.
☆60Updated 2 months ago
Alternatives and similar repositories for Skip-DiT:
Users that are interested in Skip-DiT are comparing it to the libraries listed below
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆40Updated 3 weeks ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 2 months ago
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆45Updated last month
- [EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Infe…☆91Updated 3 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 5 months ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆92Updated 7 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆116Updated 9 months ago
- ☆47Updated 2 weeks ago
- 🚀LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆69Updated 2 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆52Updated last month
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆75Updated 3 weeks ago
- ☆34Updated last month
- 📚 Collection of awesome generation acceleration resources.☆137Updated this week
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆23Updated 2 weeks ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆19Updated 2 months ago
- Code release for VTW (AAAI 2025) Oral☆32Updated last month
- ☆92Updated 7 months ago
- ☆53Updated 3 weeks ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆29Updated 2 months ago
- Triton implement of bi-directional (non-causal) linear attention☆41Updated 2 weeks ago
- This is a repo to track the latest autoregressive visual generation papers.☆137Updated last week
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆48Updated 4 months ago
- 📚 Collection of token reduction for model compression resources.☆27Updated this week
- FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.☆35Updated 7 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆85Updated 6 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆31Updated last month
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆69Updated last week
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆47Updated 2 months ago