zhijie-group / AdaMoELinks
[Findings of EMNLP 2024] AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
☆15Updated last year
Alternatives and similar repositories for AdaMoE
Users that are interested in AdaMoE are comparing it to the libraries listed below
Sorting:
- ☆22Updated 6 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆42Updated 7 months ago
- Code release for VTW (AAAI 2025 Oral)☆61Updated 2 weeks ago
- [ICML 2025 Oral] Mixture of Lookup Experts☆54Updated 6 months ago
- ☆27Updated last year
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆36Updated 10 months ago
- ☆30Updated 2 months ago
- ☆60Updated 6 months ago
- ☆61Updated 4 months ago
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆104Updated last year
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆144Updated 4 months ago
- Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA☆25Updated this week
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21Updated last year
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆40Updated 3 weeks ago
- ☆106Updated 2 months ago
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆57Updated last week
- ☆123Updated last year
- a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity☆39Updated 5 months ago
- ☆63Updated 6 months ago
- ☆14Updated last year
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆88Updated 11 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆97Updated 4 months ago
- ☆45Updated last month
- The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆96Updated this week
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆87Updated 9 months ago
- [NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models☆117Updated 6 months ago
- ☆44Updated last year
- [NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆61Updated last month
- The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…☆105Updated 2 months ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆97Updated 2 months ago