SkyworkAI / Skywork-MoELinks

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

☆137

Alternatives and similar repositories for Skywork-MoE

Users that are interested in Skywork-MoE are comparing it to the libraries listed below

Sorting:

YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆123Updated 9 months ago
bigai-nlco / TokenSwift
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
☆115Updated 5 months ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆191Updated 3 weeks ago
yyht / openrlhf_async_pipline
☆82Updated 2 months ago
FreedomIntelligence / FastLLM
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆41Updated last year
IEIT-Yuan / Yuan2.0-M32
Mixture-of-Experts (MoE) Language Model
☆189Updated last year
inclusionAI / Ling-V2
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI.
☆151Updated 2 weeks ago
SkyworkAI / skywork-o1-prm-inference
☆65Updated 10 months ago
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆110Updated 7 months ago
InternLM / OREAL
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆189Updated 7 months ago
18907305772 / FuseAI
FuseAI Project
☆87Updated 8 months ago
Glaciohound / LM-Infinite
Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆149Updated 7 months ago
GAIR-NLP / ReAlign
Reformatted Alignment
☆112Updated last year
LLM360 / MegaMath
[COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.
☆103Updated 6 months ago
OpenSparseLLMs / Linear-MoE
☆119Updated 4 months ago
zms1999 / SmartMoE
A MoE impl for PyTorch, [ATC'23] SmartMoE
☆71Updated 2 years ago
shizhediao / Post-Training-Data-Flywheel
We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.
☆60Updated last year
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆173Updated last year
OpenLMLab / scaling-rope
code for Scaling Laws of RoPE-based Extrapolation
☆73Updated 2 years ago
RUC-GSAI / YuLan-Mini
A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
☆219Updated 2 months ago
TemporaryLoRA / Temp-LoRA
☆116Updated last year
Bui1dMySea / MemLong
☆95Updated 10 months ago
THUDM / LongAlign
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
☆256Updated 10 months ago
GAIR-NLP / ProX
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
☆263Updated 3 months ago
microsoft / RedStone
The RedStone repository includes code for preparing extensive datasets used in training large language models.
☆142Updated 3 months ago
Tencent / llm.hunyuan.T1
☆84Updated 6 months ago
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆132Updated 2 years ago
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆197Updated 4 months ago
OpenBMB / Eurus
☆319Updated last year
FlagOpen / Infinity-Instruct
☆49Updated last year