vatsal0/default-moe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vatsal0/default-moe)

vatsal0 / default-moe

☆19

Alternatives and similar repositories for default-moe

Users that are interested in default-moe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ZIB-IOL / SMS
View on GitHub
Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"
☆12Oct 14, 2025Updated 9 months ago
deepspeedai / deepspeed-gpt-neox
View on GitHub
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
☆21Nov 28, 2022Updated 3 years ago
zaydzuhri / flame
View on GitHub
Fork of Flame repo for training of some new stuff in development
☆20Jul 15, 2026Updated last week
longrongyang / STGC
View on GitHub
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
☆13Feb 11, 2025Updated last year
aws-neuron / nki-llama
View on GitHub
Project showing how to develop NKI kernels for Llama 3.2 1B inference
☆21May 29, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
drarijitdas / Natural-GaLore
View on GitHub
An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace
☆19Oct 21, 2024Updated last year
dmis-lab / Monet
View on GitHub
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
☆79Jun 23, 2025Updated last year
RenzeLou / Muffin
View on GitHub
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
☆16Oct 31, 2024Updated last year
Liang137 / DPZero
View on GitHub
[ICML 2024] DPZero: Private Fine-Tuning of Language Models without Backpropagation
☆17Sep 4, 2024Updated last year
Zzzzz1 / CSKD
View on GitHub
Official code for Cumulative Spatial Knowledge Distillation for Vision Transformers (ICCV-2023) https://openaccess.thecvf.com/content/ICC…
☆15Nov 5, 2023Updated 2 years ago
lighttransport / japanese-llama-experiment
View on GitHub
Japanese LLaMa experiment
☆54Dec 27, 2025Updated 6 months ago
yanhong-lbh / text_or_pixels
View on GitHub
Codebase for EMNLP 2025 Findings paper "Text or Pixels? Evaluating Efficiency and Understanding of LLMs with Visual Text Inputs"
☆19Nov 14, 2025Updated 8 months ago
DavidFanzz / SCMoE
View on GitHub
☆29May 24, 2024Updated 2 years ago
qiuzh20 / RMoE
View on GitHub
Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)
☆33Aug 4, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
aiha-lab / TSLD
View on GitHub
[NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
☆18Dec 6, 2023Updated 2 years ago
thu-ml / ReMoE
View on GitHub
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆118Dec 20, 2024Updated last year
matttreed / diloco-sim
View on GitHub
☆23Jan 5, 2025Updated last year
thunlp / SparsingLaw
View on GitHub
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆32Nov 12, 2024Updated last year
mayank31398 / ladder-residual-inference
View on GitHub
☆14Jul 13, 2025Updated last year
swiss-ai / MoE
View on GitHub
some mixture of experts architecture implementations
☆27Mar 22, 2024Updated 2 years ago
StarDewXXX / AdaR1
View on GitHub
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆24May 6, 2026Updated 2 months ago
yining610 / dynamic-reward-weighting
View on GitHub
Official implementation of paper "Learning to Optimize Multi-objective Alignment Through Dynamic Reward Weighting"
☆28Dec 31, 2025Updated 6 months ago
morecry / CharacterChat
View on GitHub
repository for CharacterChat, a personalized social support system
☆75Jul 13, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
kdavis-mozilla / iris
View on GitHub
Demo WebApp using Kaldi DNN engine to convert speech to text
☆11Jun 12, 2016Updated 10 years ago
Algolzw / self-rewarding-smc
View on GitHub
Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models
☆16Feb 17, 2026Updated 5 months ago
ryuryukke / japanese_summarizer
View on GitHub
A summarizer for Japanese articles (but ChatGPT is better)
☆10Aug 1, 2022Updated 3 years ago
kunncheng / Diff-MoE
View on GitHub
[ICML 2025] Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts
☆34Nov 10, 2025Updated 8 months ago
YuukiOgino / VoicevoxEngineForUE
View on GitHub
UnrealEngine5版VOICEVOX Engine
☆13Nov 29, 2025Updated 7 months ago
stoneMo / SLAVC
View on GitHub
Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)
☆21Dec 6, 2022Updated 3 years ago
gau-nernst / quantized-training
View on GitHub
Explore training for quantized models
☆26Jul 12, 2025Updated last year
mzf666 / LORO-main
View on GitHub
Official implementation of ICLR 2025 'LORO: Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization'
☆17Apr 24, 2025Updated last year
apple / ml-rl-dllm
View on GitHub
Repository companioning the paper "Learning Unmasking Policies for Diffusion Language Models"
☆17Mar 30, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ParCIS / Ok-Topk
View on GitHub
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆27Dec 10, 2022Updated 3 years ago
RobertCsordas / moeut
View on GitHub
☆93Aug 18, 2024Updated last year
eyalbd2 / RL-based-Language-Modeling
View on GitHub
☆13Jan 27, 2019Updated 7 years ago
YYX666660 / LAVSS
View on GitHub
Code for LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
☆19Feb 25, 2025Updated last year
luongthecong123 / fp8-quant-matmul
View on GitHub
Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.
☆19Feb 9, 2026Updated 5 months ago
thu-ml / Jetfire-INT8Training
View on GitHub
☆63Jul 21, 2024Updated 2 years ago
liuanji / CoDD
View on GitHub
Official implementation of "Breaking the Factorization Barrier in Diffusion Language Models"
☆17Mar 27, 2026Updated 3 months ago