fattorib / ZeRO-transformerLinks

Two implementations of ZeRO-1 optimizer sharding in JAX

☆14

Alternatives and similar repositories for ZeRO-transformer

Users that are interested in ZeRO-transformer are comparing it to the libraries listed below

Sorting:

google / torchax
torchax is a PyTorch frontend for JAX. It gives JAX the ability to author JAX programs using familiar PyTorch syntax. It also provides JA…
☆122Updated this week
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 3 months ago
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated 2 months ago
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆198Updated last month
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 5 months ago
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆97Updated 2 weeks ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated 2 months ago
young-geng / mlxu
Machine Learning eXperiment Utilities
☆46Updated 3 months ago
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated 2 weeks ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 4 months ago
google / praxis
☆190Updated 3 weeks ago
davisyoshida / qax
If it quacks like a tensor...
☆59Updated last year
google / aqt
☆337Updated last week
AllanYangZhou / midGPT
Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.
☆24Updated last year
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆142Updated last year
google-deepmind / nanodo
☆285Updated last year
lucaslingle / mu_transformer
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆32Updated 5 months ago
Sea-Snell / JAX_llama
Inference code for LLaMA models in JAX
☆119Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
Sea-Snell / JAXSeq
Train very large language models in Jax.
☆210Updated 2 years ago
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆34Updated last year
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆191Updated last year
ezyang / torchdbg
PyTorch centric eager mode debugger
☆48Updated 11 months ago
vdesai2014 / inference-optimization-blog-post
☆89Updated last year
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆220Updated last year
jenkspt / gpt-jax
Jax/Flax rewrite of Karpathy's nanoGPT
☆62Updated 2 years ago
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆215Updated last year
google / qwix
a Jax quantization library
☆57Updated this week