IEIT-Yuan / Yuan2.0-M32View external linksLinks
Mixture-of-Experts (MoE) Language Model
☆194Sep 9, 2024Updated last year
Alternatives and similar repositories for Yuan2.0-M32
Users that are interested in Yuan2.0-M32 are comparing it to the libraries listed below
Sorting:
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆139Jun 12, 2024Updated last year
- Yuan 2.0 Large Language Model☆689Jul 11, 2024Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Oct 27, 2024Updated last year
- ☆79May 6, 2024Updated last year
- The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"☆19Jan 25, 2025Updated last year
- Reference implementation of Megalodon 7B model☆528May 17, 2025Updated 9 months ago
- [ACL 2024] Progressive LLaMA with Block Expansion.☆514May 20, 2024Updated last year
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- GRadient-INformed MoE☆264Sep 25, 2024Updated last year
- Reaching LLaMA2 Performance with 0.1M Dollars☆986Jul 23, 2024Updated last year
- Yi-1.5 is an upgraded version of Yi, delivering stronger performance in coding, math, reasoning, and instruction-following capability.☆557Nov 11, 2024Updated last year
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,657Mar 8, 2024Updated last year
- ☆96Dec 6, 2024Updated last year
- ☆979Feb 7, 2025Updated last year
- ☆559Aug 16, 2024Updated last year
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models☆1,894Jan 16, 2024Updated 2 years ago
- ☆91Aug 18, 2024Updated last year
- ☆66Jul 8, 2025Updated 7 months ago
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆86Jul 13, 2024Updated last year
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆60May 28, 2024Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Aug 14, 2024Updated last year
- ☆39May 20, 2025Updated 8 months ago
- ☆18Apr 18, 2025Updated 9 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Jun 28, 2024Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- ☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models☆19Jun 4, 2025Updated 8 months ago
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆20Apr 9, 2025Updated 10 months ago
- ☆22Jul 15, 2024Updated last year
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆81Jun 19, 2024Updated last year
- ☆96Oct 8, 2023Updated 2 years ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Dec 4, 2024Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆203Jul 17, 2024Updated last year
- XmodelLM☆38Nov 19, 2024Updated last year
- ☆129Oct 1, 2024Updated last year
- This repository includes the official implementation our paper "Scaling White-Box Transformers for Vision"☆48Jun 3, 2024Updated last year
- Large Reasoning Models☆806Dec 3, 2024Updated last year
- ☆129Jun 6, 2025Updated 8 months ago
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆104Jun 14, 2024Updated last year