jxbz / modula

Scalable neural net training via automatic normalization in the modular norm.

☆121

Related projects ⓘ

Alternatives and complementary repositories for modula

nikhilvyas / SOAP
☆128Updated this week
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆132Updated 8 months ago
cloneofsimo / min-fsdp
☆73Updated 4 months ago
young-geng / scalax
A simple library for scaling up JAX programs
☆127Updated 2 weeks ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆95Updated 6 months ago
ClashLuke / HeavyBall
Efficient optimizers
☆79Updated this week
google-deepmind / nanodo
☆197Updated 4 months ago
stanford-crfm / haliax
Named Tensors for Legible Deep Learning in JAX
☆153Updated this week
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆113Updated 7 months ago
cloneofsimo / scaling-guide
WIP
☆89Updated 3 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆72Updated 9 months ago
dvruette / barrel-rec-pytorch
☆53Updated 10 months ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆133Updated 4 months ago
davisyoshida / qax
If it quacks like a tensor...
☆52Updated last week
Artur-Galstyan / statedict2pytree
☆40Updated 4 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆162Updated 6 months ago
KellerJordan / Muon
Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead
☆109Updated last week
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆163Updated 3 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆84Updated last week
samuela / torch2jax
Run PyTorch in JAX. 🤝
☆200Updated last year
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆66Updated this week
cgarciae / einop
☆58Updated 2 years ago
arogozhnikov / eindex
Multidimensional indexing for tensors
☆113Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆105Updated 2 weeks ago
cloneofsimo / ezmup
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆69Updated 3 months ago
shikaiqiu / compute-better-spent
☆46Updated last month
ayaka14732 / jax-smi
JAX Synergistic Memory Inspector
☆164Updated 4 months ago
KellerJordan / cifar10-airbench
94% on CIFAR-10 in 2.6 seconds 💨 96% in 27 seconds
☆177Updated last week
BirkhoffG / jax-dataloader
Pytorch-like dataloaders in JAX.
☆59Updated last month
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆178Updated 5 months ago