arpita8 / Awesome-Mixture-of-Experts-PapersLinks
Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.
☆118Updated 9 months ago
Alternatives and similar repositories for Awesome-Mixture-of-Experts-Papers
Users that are interested in Awesome-Mixture-of-Experts-Papers are comparing it to the libraries listed below
Sorting:
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆94Updated this week
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆173Updated 2 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆161Updated 11 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆156Updated last month
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆166Updated 5 months ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆70Updated 2 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆221Updated last month
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆129Updated last week
- ☆178Updated 5 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆89Updated last year
- [arXiv 2025] Efficient Reasoning Models: A Survey☆166Updated last week
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆84Updated last year
- minimal GRPO implementation from scratch☆90Updated 2 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆169Updated 2 weeks ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆158Updated this week
- ☆105Updated 2 months ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆257Updated this week
- TransMLA: Multi-Head Latent Attention Is All You Need☆287Updated this week
- Some preliminary explorations of Mamba's context scaling.☆214Updated last year
- Repo for "Z1: Efficient Test-time Scaling with Code"☆59Updated last month
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆78Updated 5 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆215Updated 3 weeks ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 8 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆301Updated last month
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆64Updated 3 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆463Updated 3 months ago
- The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆365Updated 2 months ago
- 🔥 A minimal training framework for scaling FLA models☆146Updated 3 weeks ago
- ☆95Updated 2 weeks ago
- An extension of the nanoGPT repository for training small MOE models.☆147Updated 2 months ago