codecaution / Awesome-Mixture-of-Experts-PapersView external linksLinks
A curated reading list of research in Mixture-of-Experts(MoE).
☆660Oct 30, 2024Updated last year
Alternatives and similar repositories for Awesome-Mixture-of-Experts-Papers
Users that are interested in Awesome-Mixture-of-Experts-Papers are comparing it to the libraries listed below
Sorting:
- A collection of AWESOME things about mixture-of-experts☆1,262Dec 8, 2024Updated last year
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,228Apr 19, 2024Updated last year
- A fast MoE impl for PyTorch☆1,834Feb 10, 2025Updated last year
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆963Dec 21, 2025Updated last month
- ☆705Dec 6, 2025Updated 2 months ago
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆848Sep 13, 2023Updated 2 years ago
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,003Dec 6, 2024Updated last year
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,657Mar 8, 2024Updated last year
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆344Apr 2, 2025Updated 10 months ago
- Paper List for In-context Learning 🌷☆875Oct 8, 2024Updated last year
- Ongoing research training transformer models at scale☆15,213Updated this week
- Distributional Generalization in NLP. A roadmap.☆88Dec 12, 2022Updated 3 years ago
- Awesome papers on Language-Model-as-a-Service (LMaaS)☆547May 14, 2024Updated last year
- ☆89Apr 2, 2022Updated 3 years ago
- Reading list for research topics in Masked Image Modeling☆338Dec 3, 2024Updated last year
- [TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆482Jul 23, 2025Updated 6 months ago
- GMoE could be the next backbone model for many kinds of generalization task.☆274Mar 21, 2023Updated 2 years ago
- Latest Advances on Multimodal Large Language Models☆17,337Feb 7, 2026Updated last week
- Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃☆116Oct 27, 2022Updated 3 years ago
- A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".☆2,100Oct 5, 2023Updated 2 years ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆29Jan 23, 2024Updated 2 years ago
- ☆98Jun 6, 2022Updated 3 years ago
- A curated list of awesome resources dedicated to Scaling Laws for LLMs☆82Apr 10, 2023Updated 2 years ago
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆8,989Feb 6, 2026Updated last week
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,768Aug 4, 2024Updated last year
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,379Updated this week
- Fast and memory-efficient exact attention☆22,231Updated this week
- ☆37May 7, 2023Updated 2 years ago
- Must-read papers on prompt-based tuning for pre-trained language models.☆4,297Jul 17, 2023Updated 2 years ago
- Codebase for Merging Language Models (ICML 2024)☆864May 5, 2024Updated last year
- A curated list of prompt-based paper in computer vision and vision-language learning.☆928Dec 18, 2023Updated 2 years ago
- A curated list of reinforcement learning with human feedback resources (continually updated)☆4,296Dec 9, 2025Updated 2 months ago
- Train transformer language models with reinforcement learning.☆17,360Updated this week
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆82Oct 5, 2023Updated 2 years ago
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆20,619Feb 9, 2026Updated last week
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆4,990Jan 18, 2026Updated 3 weeks ago
- This repository contains a collection of papers and resources on Reasoning in Large Language Models.☆567Nov 13, 2023Updated 2 years ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆91Dec 3, 2024Updated last year
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓☆3,538May 7, 2025Updated 9 months ago