deepseek-ai / DeepSeek-MoELinks
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
☆1,801Updated last year
Alternatives and similar repositories for DeepSeek-MoE
Users that are interested in DeepSeek-MoE are comparing it to the libraries listed below
Sorting:
- Expert Specialized Fine-Tuning☆705Updated 4 months ago
- Scalable RL solution for advanced reasoning of language models☆1,742Updated 6 months ago
- Muon is Scalable for LLM Training☆1,318Updated last month
- An Open Large Reasoning Model for Real-World Solutions☆1,522Updated 4 months ago
- OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models☆1,821Updated 8 months ago
- Official Repo for Open-Reasoner-Zero☆2,045Updated 3 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,605Updated last year
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆989Updated 9 months ago
- OLMoE: Open Mixture-of-Experts Language Models☆872Updated last week
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models☆2,918Updated last year
- O1 Replication Journey☆1,998Updated 8 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,908Updated 5 months ago
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model☆4,939Updated last year
- Simple RL training for reasoning☆3,754Updated last month
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,560Updated 4 months ago
- Reproduce R1 Zero on Logic Puzzle☆2,400Updated 6 months ago
- LongBench v2 and LongBench (ACL 25'&24')☆977Updated 8 months ago
- Analyze computation-communication overlap in V3/R1.☆1,103Updated 6 months ago
- Large Reasoning Models☆805Updated 9 months ago
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆1,354Updated last week
- Expert Parallelism Load Balancer☆1,272Updated 6 months ago
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,133Updated this week
- A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI☆770Updated last year
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆923Updated 7 months ago
- ☆815Updated 3 months ago
- ☆963Updated 8 months ago
- ☆1,352Updated 10 months ago
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.☆2,863Updated 6 months ago
- DataComp for Language Models☆1,369Updated 3 weeks ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,613Updated last year