JesseFarebro / flax-mupLinks

Maximal Update Parametrization (μP) with Flax & Optax.

☆16

Alternatives and similar repositories for flax-mup

Users that are interested in flax-mup are comparing it to the libraries listed below

Sorting:

young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated last month
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆143Updated last year
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last week
evanatyourservice / psgd_jax
Implementation of PSGD optimizer in JAX
☆35Updated 11 months ago
kvfrans / splus
☆119Updated 5 months ago
kvfrans / jax-flow
Flow-matching algorithms in JAX
☆111Updated last year
BirkhoffG / jax-dataloader
Pytorch-like dataloaders for JAX.
☆97Updated 6 months ago
yixiaoer / einshard
Einsum-like high-level array sharding API for JAX
☆34Updated last year
radarFudan / mamba-minimal-jax
☆35Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 11 months ago
davisyoshida / qax
If it quacks like a tensor...
☆59Updated last year
anh-tong / nanoGPT-equinox
nanoGPT using Equinox
☆14Updated 2 years ago
lindermanlab / elk
Scalable and Stable Parallelization of Nonlinear RNNS
☆26Updated last month
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆35Updated last year
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆92Updated last year
GallagherCommaJack / modulax
☆17Updated last year
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆192Updated last year
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆204Updated 2 months ago
shikaiqiu / compute-better-spent
☆61Updated last year
nikhilvyas / SOAP
☆225Updated last year
google-deepmind / nanodo
☆285Updated last year
apple / ml-ademamix
☆68Updated last year
modula-systems / modula
🧱 Modula software package
☆309Updated 3 months ago
dlwh / jax_sourceror
Turn jitted jax functions back into python source code
☆22Updated 11 months ago
AakashKumarNain / mistral_jax
This is a port of Mistral-7B model in JAX
☆32Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
willisma / diffuse_nnx
A comprehensive JAX/NNX library for diffusion and flow matching generative algorithms, featuring DiT (Diffusion Transformer) and its vari…
☆117Updated last month
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 4 months ago
phlippe / jax_trainer
Lightning-like training API for JAX with Flax
☆44Updated 11 months ago
cloneofsimo / ezmup
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆85Updated last year