lucaslingle/mu_transformer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lucaslingle/mu_transformer)

lucaslingle / mu_transformer

Official implementation of 'A Large-Scale Exploration of mu-Transfer' (CoRR 2024)

☆31

Alternatives and similar repositories for mu_transformer

Users that are interested in mu_transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JesseFarebro / flax-mup
View on GitHub
Maximal Update Parametrization (μP) with Flax & Optax.
☆16Dec 27, 2023Updated 2 years ago
google / drjax
View on GitHub
☆19Jul 8, 2026Updated 3 weeks ago
Arongil / lipschitz-transformers
View on GitHub
Don't just regulate gradients like in Muon, regulate the weights too
☆32Jul 30, 2025Updated 11 months ago
erfanzar / eformer
View on GitHub
(EasyDel Former) is a utility library designed to simplify and enhance the development in JAX
☆33Jul 23, 2026Updated last week
hfarias / mask_galaxy
View on GitHub
Morphological segmentation of Galaxies
☆10Dec 17, 2021Updated 4 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
dvruette / gidd-easydel
View on GitHub
☆25Dec 16, 2025Updated 7 months ago
haydn-jones / SOAP_JAX
View on GitHub
Unofficial JAX implementation of the SOAP optimizer (https://arxiv.org/abs/2409.11321)
☆29Jul 21, 2026Updated last week
fattorib / ZeRO-transformer
View on GitHub
Two implementations of ZeRO-1 optimizer sharding in JAX
☆14Jun 11, 2023Updated 3 years ago
Lucasc-99 / NoTorch
View on GitHub
A from-scratch neural network and transformers library, with speeds rivaling PyTorch
☆10Mar 16, 2025Updated last year
anh-tong / nanoGPT-equinox
View on GitHub
nanoGPT using Equinox
☆15Mar 3, 2023Updated 3 years ago
graphcore-research / unit-scaling
View on GitHub
A library for unit scaling in PyTorch
☆135Jul 11, 2025Updated last year
graphcore-research / jax-scalify
View on GitHub
JAX Scalify: end-to-end scaled arithmetics
☆18Oct 30, 2024Updated last year
packquickly / schedule_free_optx
View on GitHub
Schedule free optimiser implemented in JAX using Optimistix
☆15May 29, 2024Updated 2 years ago
Stuermer / EchelleSimulator
View on GitHub
☆12Dec 15, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / mutransformers
View on GitHub
some common Huggingface transformers in maximal update parametrization (µP)
☆88Mar 14, 2022Updated 4 years ago
Sike-Wang / low-bit-Shampoo
View on GitHub
4-bit Shampoo for Memory-Efficient Network Training (NeurIPS 2024)
☆13Feb 13, 2025Updated last year
luyug / magix
View on GitHub
Supercharge huggingface transformers with model parallelism.
☆77Jul 23, 2025Updated last year
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
jasonge27 / StacklessRayTracer
View on GitHub
Implemented stackless KDTree on GPU to accelerate ray tracing rendering algorithm. Hardware level optimizations for register spills local…
☆21Oct 6, 2016Updated 9 years ago
Scony / pytest-timeouts
View on GitHub
Linux-only Pytest plugin to control durations of various test case execution phases
☆12Dec 30, 2019Updated 6 years ago
evanatyourservice / psgd_jax
View on GitHub
Implementation of PSGD optimizer in JAX
☆36Dec 31, 2024Updated last year
opooladz / Preconditioned-Stochastic-Gradient-Descent
View on GitHub
A repo based on XiLin Li's PSGD repo that extends some of the experiments.
☆14Oct 7, 2024Updated last year
cloneofsimo / min-max-gpt
View on GitHub
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Apr 17, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
thomasahle / kanmlps
View on GitHub
KANs and MLPs
☆12Jun 7, 2024Updated 2 years ago
lixilinx / Fully-Trainable-SSM
View on GitHub
A fully trainable state space model (SSM)
☆16Mar 18, 2025Updated last year
evanatyourservice / llm-jax
View on GitHub
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆19Jul 24, 2025Updated last year
ucsb-seclab / BullseyePoison
View on GitHub
Bullseye Polytope Clean-Label Poisoning Attack
☆18Nov 5, 2020Updated 5 years ago
GallagherCommaJack / modulax
View on GitHub
☆18Aug 24, 2024Updated last year
koesterlab / setup-slurm-action
View on GitHub
A github action to setup a small SLURM cluster for testing purposes.
☆14Jul 20, 2025Updated last year
liamlio / MolGAN
View on GitHub
AI for a cure, a combination of Latent-GAN and VAE-JTNN to create 100% valid drug like molecules
☆10Mar 16, 2020Updated 6 years ago
MatX-inc / seqax
View on GitHub
seqax = sequence modeling + JAX
☆195Jul 23, 2025Updated last year
Algomancer / VCReg
View on GitHub
Minimal Implimentation of VCRec (2024) for collapse provention.
☆18Jan 28, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
huggingface / candle-paged-attention
View on GitHub
☆12Jan 4, 2024Updated 2 years ago
Overworldai / owl-vaes
View on GitHub
Weird autoencoder experiments
☆24May 20, 2026Updated 2 months ago
HazyResearch / embroid
View on GitHub
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
☆11Aug 12, 2023Updated 2 years ago
bhneo / decorrelated_bn
View on GitHub
An implementation of DecorrelatedBN by tensorflow
☆13Jun 30, 2022Updated 4 years ago
JesseFarebro / xtils
View on GitHub
A collection of utilities for machine learning experiments.
☆11Jan 8, 2026Updated 6 months ago
Overworldai / owl-wms
View on GitHub
Basic world models
☆33Oct 30, 2025Updated 8 months ago
sholtodouglas / scalingExperiments
View on GitHub
☆62Mar 4, 2022Updated 4 years ago