deepseek-ai / DeepSeek-MoELinks
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
☆1,704Updated last year
Alternatives and similar repositories for DeepSeek-MoE
Users that are interested in DeepSeek-MoE are comparing it to the libraries listed below
Sorting:
- Muon is Scalable for LLM Training☆1,049Updated 2 months ago
- Official Repo for Open-Reasoner-Zero☆1,930Updated last month
- Large Reasoning Models☆804Updated 5 months ago
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,261Updated 3 weeks ago
- Expert Specialized Fine-Tuning☆615Updated last week
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆895Updated 3 months ago
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.☆1,266Updated last week
- FlashInfer: Kernel Library for LLM Serving☆3,044Updated this week
- ☆522Updated 9 months ago
- Analyze computation-communication overlap in V3/R1.☆1,040Updated 2 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,534Updated last year
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆2,441Updated last week
- Scalable RL solution for advanced reasoning of language models☆1,587Updated 2 months ago
- O1 Replication Journey☆1,990Updated 4 months ago
- Simple RL training for reasoning☆3,584Updated last month
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,035Updated this week
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model☆4,895Updated 8 months ago
- Expert Parallelism Load Balancer☆1,199Updated 2 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,777Updated last month
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models☆2,723Updated last year
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆868Updated last month
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆3,844Updated last year
- An Open Large Reasoning Model for Real-World Solutions☆1,494Updated this week
- ☆773Updated last month
- Minimalistic large language model 3D-parallelism training☆1,888Updated last week
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆1,876Updated this week
- Distributed RL System for LLM Reasoning☆1,281Updated this week
- OLMoE: Open Mixture-of-Experts Language Models☆764Updated 2 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,249Updated 2 months ago
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.☆2,784Updated 2 months ago