openai / chzLinks

☆179

Alternatives and similar repositories for chz

Users that are interested in chz are comparing it to the libraries listed below

Sorting:

cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆97Updated last week
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated last year
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated last month
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆170Updated 4 months ago
modula-systems / modula
🧱 Modula software package
☆299Updated 2 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆234Updated last month
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 3 months ago
cloneofsimo / scaling-guide
WIP
☆93Updated last year
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆141Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 3 months ago
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆193Updated last year
apple / ml-planner
☆56Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆86Updated 3 years ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year
Sea-Snell / JAXSeq
Train very large language models in Jax.
☆209Updated 2 years ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆132Updated 10 months ago
google-research / kauldron
Modular, scalable library to train ML models
☆168Updated this week
facebookresearch / moodist
moodist
☆22Updated last month
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
google-deepmind / nanodo
☆283Updated last year
microsoft / ReinMax
Beyond Straight-Through
☆102Updated 2 years ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 4 months ago
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆94Updated last month
m-a-n-i-f-e-s-t / power-attention
Attention Kernels for Symmetric Power Transformers
☆121Updated last month
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 11 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
lucaslingle / mu_transformer
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆32Updated 4 months ago