yaof20/DenseMixer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yaof20/DenseMixer)

yaof20 / DenseMixer

Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient

☆68

Alternatives and similar repositories for DenseMixer

Users that are interested in DenseMixer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yaof20 / ReaL
View on GitHub
Implementation and datasets for "Training Language Models to Generate Quality Code with Program Analysis Feedback"
☆42Jul 21, 2025Updated last year
thunlp / BlockFFN
View on GitHub
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
☆19Jan 10, 2026Updated 6 months ago
cat-state / modded-nanogpt-moe
View on GitHub
☆17Sep 6, 2025Updated 10 months ago
thunlp / SparsingLaw
View on GitHub
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆32Nov 12, 2024Updated last year
imoneoi / bf16_fused_adam
View on GitHub
BFloat16 Fused Adam Operator for PyTorch
☆20Nov 16, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / SparseMixer
View on GitHub
Sparse Backpropagation for Mixture-of-Expert Training
☆30Jul 2, 2024Updated 2 years ago
Farseer-Scaling-Law / Farseer
View on GitHub
☆21Jun 12, 2025Updated last year
yaof20 / Flash-RL
View on GitHub
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆306Nov 7, 2025Updated 8 months ago
allenai / FlexOlmo
View on GitHub
Code and training scripts for FlexOlmo
☆151Apr 20, 2026Updated 3 months ago
TIGER-AI-Lab / General-Reasoner
View on GitHub
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆228Nov 27, 2025Updated 7 months ago
xinmei9322 / semicrowd
View on GitHub
Code for Semi-crowdsourced Clustering with Deep Generative Models
☆12Dec 9, 2022Updated 3 years ago
microsoft / ArchScale
View on GitHub
Simple & Scalable Pretraining for Neural Architecture Research
☆339Mar 31, 2026Updated 3 months ago
reka-ai / rekaquant
View on GitHub
☆63Jul 10, 2025Updated last year
ZihanWang314 / CoE
View on GitHub
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
☆231Nov 4, 2025Updated 8 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
chen-hao-chao / mdm-prime-v2
View on GitHub
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models
☆27May 23, 2026Updated 2 months ago
open-lm-engine / lm-engine
View on GitHub
LM engine is a library for pretraining/finetuning LLMs
☆184Updated this week
thu-ml / ReMoE
View on GitHub
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆118Dec 20, 2024Updated last year
Zyphra / zcookbook
View on GitHub
Training hybrid models for dummies.
☆31Nov 1, 2025Updated 8 months ago
StigLidu / TURN
View on GitHub
[ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"
☆23Feb 16, 2025Updated last year
stanford-oval / sliders
View on GitHub
Repository for paper: Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets
☆27Apr 27, 2026Updated 2 months ago
aHapBean / xHC
View on GitHub
[Tech Report] Expanded Hyper-Connections
☆47Updated this week
BaohaoLiao / frac-cot
View on GitHub
[COLM 2026] An efficient 3D sampling method for long-CoT LLM.
☆16May 25, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
SalesforceAIResearch / LeastLoadedEP
View on GitHub
☆18Jun 2, 2026Updated last month
GAIR-NLP / benbench
View on GitHub
Benchmarking Benchmark Leakage in Large Language Models
☆61May 20, 2024Updated 2 years ago
TsinghuaC3I / ZEDA
View on GitHub
Post-Trained MoE Can Skip Half Experts via Self-Distillation
☆38May 19, 2026Updated 2 months ago
EvanZhuang / knowledge_flow
View on GitHub
Official Implementation of Knowledge Flow Prompting
☆35Oct 20, 2025Updated 9 months ago
arcee-ai / trinity-large-tech-report
View on GitHub
☆126Feb 19, 2026Updated 5 months ago
tanyuqian / cappy
View on GitHub
NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
☆49Mar 29, 2024Updated 2 years ago
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
allenai / signal-and-noise
View on GitHub
Measuring the Signal to Noise Ratio in Language Model Evaluation
☆31Aug 19, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
LLM360 / Reasoning360
View on GitHub
A repo for open research on building large reasoning models
☆151Jul 3, 2026Updated 2 weeks ago
inclusionAI / GroveMoE
View on GitHub
☆24Aug 20, 2025Updated 11 months ago
HosseinZaredar / Transformer-from-Scratch
View on GitHub
Transformer from Scratch in PyTorch
☆18Mar 26, 2022Updated 4 years ago
OpenSparseLLMs / MoM
View on GitHub
☆139Feb 4, 2026Updated 5 months ago
nikhilchandak / answer-matching
View on GitHub
Code for 'Answer Matching Outperforms Multiple Choice for Language Model Evaluation' paper
☆18Jul 4, 2025Updated last year
kvfrans / matrix-whitening
View on GitHub
Code for "What really matters in matrix-whitening optimizers?"
☆25Oct 31, 2025Updated 8 months ago
OpenSparseLLMs / Linear-MoE
View on GitHub
☆139Jun 6, 2025Updated last year