deepseek-ai / DeepSeek-MoELinks

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

☆1,758

Alternatives and similar repositories for DeepSeek-MoE

Users that are interested in DeepSeek-MoE are comparing it to the libraries listed below

Sorting:

deepseek-ai / ESFT
Expert Specialized Fine-Tuning
☆654Updated 2 months ago
XueFuzhao / OpenMoE
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
☆1,568Updated last year
MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆1,223Updated 4 months ago
deepseek-ai / DeepSeek-Math
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
☆2,830Updated last year
Open-Reasoner-Zero / Open-Reasoner-Zero
Official Repo for Open-Reasoner-Zero
☆2,008Updated last month
SafeAILab / EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
☆1,439Updated this week
allenai / OLMoE
OLMoE: Open Mixture-of-Experts Language Models
☆823Updated 4 months ago
PRIME-RL / PRIME
Scalable RL solution for advanced reasoning of language models
☆1,668Updated 4 months ago
allenai / open-instruct
AllenAI's post-training codebase
☆3,083Updated this week
deepseek-ai / DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
☆4,936Updated 10 months ago
GAIR-NLP / O1-Journey
O1 Replication Journey
☆1,996Updated 6 months ago
MoonshotAI / MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
☆1,846Updated 3 months ago
AIDC-AI / Marco-o1
An Open Large Reasoning Model for Real-World Solutions
☆1,508Updated 2 months ago
mlfoundations / dclm
DataComp for Language Models
☆1,342Updated 4 months ago
pjlab-sys4nlp / llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
☆977Updated 7 months ago
microsoft / MInference
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…
☆1,080Updated this week
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,181Updated 2 weeks ago
princeton-nlp / SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
☆912Updated 5 months ago
openreasoner / openr
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
☆1,803Updated 6 months ago
SimpleBerry / LLaMA-O1
Large Reasoning Models
☆804Updated 7 months ago
jquesnelle / yarn
YaRN: Efficient Context Window Extension of Large Language Models
☆1,536Updated last year
BytedTsinghua-SIA / DAPO
An Open-source RL System from ByteDance Seed and Tsinghua AIR
☆1,470Updated 2 months ago
alibaba / Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
☆1,258Updated 3 weeks ago
open-compass / MixtralKit
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
☆767Updated last year
hkust-nlp / simpleRL-reason
Simple RL training for reasoning
☆3,693Updated 3 months ago
deepspeedai / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,123Updated 2 weeks ago
NVIDIA / NeMo-Aligner
Scalable toolkit for efficient model alignment
☆833Updated this week
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,221Updated 2 months ago
ByteDance-Seed / Seed-Thinking-v1.5
☆800Updated last month
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,262Updated 4 months ago