arpita8 / Awesome-Mixture-of-Experts-Papers
Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.
☆101Updated 5 months ago
Alternatives and similar repositories for Awesome-Mixture-of-Experts-Papers:
Users that are interested in Awesome-Mixture-of-Experts-Papers are comparing it to the libraries listed below
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆104Updated 3 weeks ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆96Updated 5 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆149Updated last month
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆81Updated 8 months ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆149Updated last month
- ☆165Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆81Updated this week
- ☆90Updated 3 weeks ago
- The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆227Updated 3 weeks ago
- A brief and partial summary of RLHF algorithms.☆93Updated 2 months ago
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆114Updated 2 months ago
- The official implementation of the paper "Demystifying the Compression of Mixture-of-Experts Through a Unified Framework".☆57Updated 3 months ago
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆58Updated 3 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆158Updated 2 months ago
- ☆59Updated last week
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆194Updated 2 weeks ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆152Updated last month
- awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.☆185Updated this week
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆270Updated this week
- Some preliminary explorations of Mamba's context scaling.☆213Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆43Updated last week
- ☆146Updated last week
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆145Updated 7 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆182Updated last week
- ☆165Updated 2 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆289Updated 2 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆140Updated 11 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆114Updated 8 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆172Updated 6 months ago