LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆70Updated last week
Alternatives and similar repositories for DynMoE:
Users that are interested in DynMoE are comparing it to the libraries listed below
- Code release for VTW (AAAI 2025) Oral☆32Updated last month
- [EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Infe…☆91Updated 3 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆49Updated 4 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆31Updated 7 months ago
- ☆93Updated 7 months ago
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆41Updated 3 weeks ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆72Updated 8 months ago
- ✈️ Accelerating Vision Diffusion Transformers with Skip Branches.☆60Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆89Updated last month
- ☆35Updated last month
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆116Updated 9 months ago
- A Self-Training Framework for Vision-Language Reasoning☆63Updated 3 weeks ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆37Updated 2 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆53Updated last month
- ☆47Updated 2 weeks ago
- Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal …☆39Updated last week
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆82Updated 8 months ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 2 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆47Updated 2 months ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆32Updated 8 months ago
- code for ACL24 "MELoRA: Mini-Ensemble Low-Rank Adapter for Parameter-Efficient Fine-Tuning"☆16Updated this week
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆44Updated 3 months ago
- 🚀LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆69Updated 2 months ago
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆48Updated 2 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 5 months ago
- ☆17Updated 2 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆51Updated last week
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆17Updated 6 months ago