A curated reading list of research in Mixture-of-Experts(MoE).
☆661Oct 30, 2024Updated last year
Alternatives and similar repositories for Awesome-Mixture-of-Experts-Papers
Users that are interested in Awesome-Mixture-of-Experts-Papers are comparing it to the libraries listed below
Sorting:
- A collection of AWESOME things about mixture-of-experts☆1,269Dec 8, 2024Updated last year
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,232Apr 19, 2024Updated last year
- A fast MoE impl for PyTorch☆1,845Feb 10, 2025Updated last year
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆973Updated this week
- ☆707Dec 6, 2025Updated 3 months ago
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,002Dec 6, 2024Updated last year
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆140Aug 21, 2024Updated last year
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,664Mar 8, 2024Updated 2 years ago
- Paper List for In-context Learning 🌷☆874Oct 8, 2024Updated last year
- Ongoing research training transformer models at scale☆15,535Updated this week
- Distributional Generalization in NLP. A roadmap.☆88Dec 12, 2022Updated 3 years ago
- ☆89Apr 2, 2022Updated 3 years ago
- Reading list for research topics in Masked Image Modeling☆335Dec 3, 2024Updated last year
- [TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".☆482Jul 23, 2025Updated 7 months ago
- Latest Advances on Multimodal Large Language Models☆17,416Updated this week
- Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃☆116Oct 27, 2022Updated 3 years ago
- A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".☆2,101Oct 5, 2023Updated 2 years ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆29Jan 23, 2024Updated 2 years ago
- ☆98Jun 6, 2022Updated 3 years ago
- A curated list of awesome resources dedicated to Scaling Laws for LLMs☆81Apr 10, 2023Updated 2 years ago
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆9,084Mar 3, 2026Updated last week
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,766Aug 4, 2024Updated last year
- Fast and memory-efficient exact attention☆22,460Updated this week
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,474Mar 3, 2026Updated last week
- ☆37May 7, 2023Updated 2 years ago
- Must-read papers on prompt-based tuning for pre-trained language models.☆4,293Jul 17, 2023Updated 2 years ago
- Codebase for Merging Language Models (ICML 2024)☆863May 5, 2024Updated last year
- A curated list of prompt-based paper in computer vision and vision-language learning.☆925Dec 18, 2023Updated 2 years ago
- A curated list of reinforcement learning with human feedback resources (continually updated)☆4,317Dec 9, 2025Updated 3 months ago
- Train transformer language models with reinforcement learning.☆17,523Updated this week
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆82Oct 5, 2023Updated 2 years ago
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆20,717Mar 3, 2026Updated last week
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆5,040Feb 27, 2026Updated last week
- This repository contains a collection of papers and resources on Reasoning in Large Language Models.☆567Nov 13, 2023Updated 2 years ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆93Dec 3, 2024Updated last year
- From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓☆3,554May 7, 2025Updated 10 months ago
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆92Mar 16, 2023Updated 2 years ago
- Transformer related optimization, including BERT, GPT☆6,399Mar 27, 2024Updated last year
- A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.☆6,896Dec 17, 2025Updated 2 months ago