er537 / MuPLinks

☆10

Alternatives and similar repositories for MuP

Users that are interested in MuP are comparing it to the libraries listed below

Sorting:

ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆130Updated 3 months ago
berlino / seq_icl
☆53Updated last year
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last month
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆189Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆166Updated 3 months ago
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated 11 months ago
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆107Updated 5 months ago
LIONS-EPFL / scion
☆41Updated last month
MatX-inc / seqax
seqax = sequence modeling + JAX
☆167Updated 2 months ago
google-deepmind / nanodo
☆283Updated last year
modula-systems / modula
🧱 Modula software package
☆287Updated 2 months ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 4 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆131Updated 10 months ago
HomebrewML / HeavyBall
Efficient optimizers
☆274Updated this week
apple / ml-ademamix
☆67Updated 11 months ago
nikhilvyas / SOAP
☆217Updated 10 months ago
cloneofsimo / min-fsdp
☆91Updated last year
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆233Updated 3 weeks ago
evanatyourservice / psgd_jax
Implementation of PSGD optimizer in JAX
☆33Updated 9 months ago
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆86Updated 3 weeks ago
shikaiqiu / compute-better-spent
☆58Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆141Updated last year
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆70Updated last year
JesseFarebro / flax-mup
Maximal Update Parametrization (μP) with Flax & Optax.
☆16Updated last year
lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆188Updated this week
cloneofsimo / scaling-guide
WIP
☆93Updated last year
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆52Updated last year