IEIT-Yuan / Yuan2.0-M32
View external linksLinks

Mixture-of-Experts (MoE) Language Model

☆194

Alternatives and similar repositories for Yuan2.0-M32

Users that are interested in Yuan2.0-M32 are comparing it to the libraries listed below

Sorting:

SkyworkAI / Skywork-MoE
View on GitHub
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆139Jun 12, 2024Updated last year
IEIT-Yuan / Yuan-2.0
View on GitHub
Yuan 2.0 Large Language Model
☆689Jul 11, 2024Updated last year
TIGER-AI-Lab / MAmmoTH2
View on GitHub
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆149Oct 27, 2024Updated last year
xverse-ai / XVERSE-V-13B
View on GitHub
☆79May 6, 2024Updated last year
ttw1018 / MoPE-DST
View on GitHub
The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"
☆19Jan 25, 2025Updated last year
XuezheMax / megalodon
View on GitHub
Reference implementation of Megalodon 7B model
☆528May 17, 2025Updated 9 months ago
TencentARC / LLaMA-Pro
View on GitHub
[ACL 2024] Progressive LLaMA with Block Expansion.
☆514May 20, 2024Updated last year
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
microsoft / GRIN-MoE
View on GitHub
GRadient-INformed MoE
☆264Sep 25, 2024Updated last year
myshell-ai / JetMoE
View on GitHub
Reaching LLaMA2 Performance with 0.1M Dollars
☆986Jul 23, 2024Updated last year
01-ai / Yi-1.5
View on GitHub
Yi-1.5 is an upgraded version of Yi, delivering stronger performance in coding, math, reasoning, and instruction-following capability.
☆557Nov 11, 2024Updated last year
XueFuzhao / OpenMoE
View on GitHub
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
☆1,657Mar 8, 2024Updated last year
Bui1dMySea / MemLong
View on GitHub
☆96Dec 6, 2024Updated last year
multimodal-art-projection / MAP-NEO
View on GitHub
☆979Feb 7, 2025Updated last year
Snowflake-Labs / snowflake-arctic
View on GitHub
☆559Aug 16, 2024Updated last year
deepseek-ai / DeepSeek-MoE
View on GitHub
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
☆1,894Jan 16, 2024Updated 2 years ago
RobertCsordas / moeut
View on GitHub
☆91Aug 18, 2024Updated last year
OpenSparseLLMs / Linearization
View on GitHub
☆66Jul 8, 2025Updated 7 months ago
Mowenyii / PAE
View on GitHub
[CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation
☆86Jul 13, 2024Updated last year
Zoeyyao27 / SirLLM
View on GitHub
This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM
☆60May 28, 2024Updated last year
nanowell / Q-Sparse-LLM
View on GitHub
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Aug 14, 2024Updated last year
SJTU-DENG-Lab / UniCMs
View on GitHub
☆39May 20, 2025Updated 8 months ago
SLIT-AI / FuseChat-3.0
View on GitHub
☆18Apr 18, 2025Updated 9 months ago
SparksJoe / Prism
View on GitHub
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Jun 28, 2024Updated last year
GATECH-EIC / Linearized-LLM
View on GitHub
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Jun 12, 2024Updated last year
john-hewitt / implicit-ins
View on GitHub
Codebase for Instruction Following without Instruction Tuning
☆36Sep 24, 2024Updated last year
linhaowei1 / kumo
View on GitHub
☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models
☆19Jun 4, 2025Updated 8 months ago
tianyi-lab / C3PO
View on GitHub
[COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"
☆20Apr 9, 2025Updated 10 months ago
Ginjing-Yuan / QWen2-from_ground_up
View on GitHub
☆22Jul 15, 2024Updated last year
QwenLM / online_merging_optimizers
View on GitHub
Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
☆81Jun 19, 2024Updated last year
OpenNLG / OpenBA
View on GitHub
☆96Oct 8, 2023Updated 2 years ago
Cranial-XIX / longhorn
View on GitHub
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆56Dec 4, 2024Updated last year
VITA-Group / Q-GaLore
View on GitHub
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆203Jul 17, 2024Updated last year
XiaoduoAILab / XmodelLM
View on GitHub
XmodelLM
☆38Nov 19, 2024Updated last year
SALT-NLP / demonstrated-feedback
View on GitHub
☆129Oct 1, 2024Updated last year
UCSC-VLAA / CRATE-alpha
View on GitHub
This repository includes the official implementation our paper "Scaling White-Box Transformers for Vision"
☆48Jun 3, 2024Updated last year
SimpleBerry / LLaMA-O1
View on GitHub
Large Reasoning Models
☆806Dec 3, 2024Updated last year
OpenSparseLLMs / Linear-MoE
View on GitHub
☆129Jun 6, 2025Updated 8 months ago
YuchuanTian / DiJiang
View on GitHub
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…
☆104Jun 14, 2024Updated last year

IEIT-Yuan / Yuan2.0-M32View external linksLinks

Alternatives and similar repositories for Yuan2.0-M32

IEIT-Yuan / Yuan2.0-M32
View external linksLinks