swiss-ai/MoE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/swiss-ai/MoE)

swiss-ai / MoE

some mixture of experts architecture implementations

☆27

Alternatives and similar repositories for MoE

Users that are interested in MoE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ananthu-aniraj / masking_strategies_bias_removal
View on GitHub
Masking Strategies for Background Bias Removal in Computer Vision Models (ICCVW OODCV 2023 paper)
☆16Jul 3, 2025Updated last year
garcinc / noised-topk
View on GitHub
[ICML2022] Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification
☆12Jan 30, 2024Updated 2 years ago
epfl-dlab / LAMEN
View on GitHub
☆17May 3, 2024Updated 2 years ago
apd10 / universal_memory_allocation
View on GitHub
☆15Apr 26, 2022Updated 4 years ago
epfml / llm-baselines
View on GitHub
nanoGPT-like codebase for LLM training
☆118Nov 7, 2025Updated 8 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
robertdvdk / part_detection
View on GitHub
Code accompanying PDiscoNet: Semantically consistent part discovery for fine-grained recognition
☆15Dec 10, 2023Updated 2 years ago
illuin-tech / modernvbert
View on GitHub
ModernVBERT is a 250M-parameter vision–language encoder that aligns a text-encoder (Ettin-150M) with a vision-encoder (SigLIP2-B) through…
☆16Oct 16, 2025Updated 9 months ago
ananthu-aniraj / pdiscoformer
View on GitHub
[ECCV 2024 Oral] Official implementation of the paper "PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers"
☆21Apr 9, 2026Updated 3 months ago
lucidrains / sinkhorn-router-pytorch
View on GitHub
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise
☆40Aug 29, 2024Updated last year
agwaBom / towards_moe
View on GitHub
Implementation of "Towards Understanding Mixture of Experts in Deep Learning", NeurIPS 2022
☆10Jan 6, 2023Updated 3 years ago
oguiza / DataAugmentation
View on GitHub
☆12Mar 16, 2022Updated 4 years ago
deepspeedai / deepspeed-gpt-neox
View on GitHub
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
☆21Nov 28, 2022Updated 3 years ago
erosenfeld / disagree_discrep
View on GitHub
Provably (and non-vacuously) bounding test error of deep neural networks under distribution shift with unlabeled test data.
☆10Feb 27, 2024Updated 2 years ago
binxio / ec2-boot-mount-ebs-volume
View on GitHub
Mount an EBS Volume and updates /etc/fstab on ec2 instance boot
☆16Jan 28, 2019Updated 7 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
mklissa / dceo
View on GitHub
Learning diverse options through the Laplacian representation.
☆23Jan 5, 2024Updated 2 years ago
EleutherAI / nanoGPT-mup
View on GitHub
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆200Jan 19, 2026Updated 6 months ago
gty111 / GEMM_MMA
View on GitHub
Optimize GEMM with tensorcore step by step
☆40Dec 17, 2023Updated 2 years ago
neilliang90 / Sadam
View on GitHub
☆14Aug 28, 2019Updated 6 years ago
zkx06111 / ALGO
View on GitHub
☆36May 25, 2023Updated 3 years ago
rainlanguage / rainlang
View on GitHub
Solidity library for implementing Rain compatible interpreters.
☆15Jul 16, 2026Updated last week
Zyphra / zcookbook
View on GitHub
Training hybrid models for dummies.
☆31Nov 1, 2025Updated 8 months ago
op-rs / durin
View on GitHub
A Rust library for creating solvers in the OP Stack's dispute protocol
☆19Jan 15, 2024Updated 2 years ago
Debatrix / DFSNet
View on GitHub
An recognition oriented deep learning framework for biometric sample quality assessment
☆12Aug 24, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
nestordemeure / AdaHessianJax
View on GitHub
Jax implementation of the AdaHessian optimizer
☆19Mar 11, 2021Updated 5 years ago
wafer-ai / chipbenchmark
View on GitHub
a platform for monitoring the chip situation
☆16Jul 19, 2025Updated last year
albertozeni / starlight
View on GitHub
Starlight: A Kernel Optimizer for GPU Processing
☆16Jan 10, 2024Updated 2 years ago
nestordemeure / ManifoldMixup
View on GitHub
Manifold-Mixup implementation for fastai V1
☆19Oct 1, 2020Updated 5 years ago
mooyoul / serverless-latest-layer-version
View on GitHub
A serverless plugin that replaces 'latest' version tag to actual lambda layer version
☆25Updated this week
edvinli / federated-learning-mixture
View on GitHub
Federated learning using a mixture of experts
☆17Feb 16, 2021Updated 5 years ago
thomasgauthier / LLM-self-play
View on GitHub
Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)
☆29Mar 1, 2024Updated 2 years ago
skolai / fewbit
View on GitHub
Compression schema for gradients of activations in backward pass
☆45Jul 26, 2023Updated 3 years ago
annahdo / implementing_activation_steering
View on GitHub
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆24Oct 18, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
DCDmllm / HyperLLaVA
View on GitHub
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Mar 22, 2024Updated 2 years ago
devansh20la / LPF-SGD
View on GitHub
☆17Dec 11, 2022Updated 3 years ago
lessw2020 / mrnet-fastai
View on GitHub
Deep Learning CNN using FastAI for the Stanford MRNet Knee MRI diagnosis challenge
☆16May 18, 2019Updated 7 years ago
kyleliang919 / Long-context-transformers
View on GitHub
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Mar 22, 2023Updated 3 years ago
isaaccorley / mlp-mixer-pytorch
View on GitHub
PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)
☆31May 13, 2021Updated 5 years ago
ParCIS / Ok-Topk
View on GitHub
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆27Dec 10, 2022Updated 3 years ago
uclaml / MoE
View on GitHub
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆35Dec 12, 2023Updated 2 years ago