ClashLuke / tpucareLinks

Automatically take good care of your preemptible TPUs

☆36

Alternatives and similar repositories for tpucare

Users that are interested in tpucare are comparing it to the libraries listed below

Sorting:

HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆50Updated last year
cloneofsimo / min-fsdp
☆82Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆130Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
borisdayma / clip-jax
Train vision models using JAX and 🤗 transformers
☆98Updated 3 months ago
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆35Updated 2 years ago
cloneofsimo / zeroshampoo
☆34Updated 10 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆85Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 7 months ago
HomebrewML / HomebrewNLP-torch
A case study of efficient training of large language models using commodity hardware.
☆68Updated 2 years ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆89Updated last year
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated 10 months ago
ethansmith2000 / TransformerExperiments
☆19Updated 2 months ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆82Updated 3 years ago
crowsonkb / LDLM
Latent Diffusion Language Models
☆68Updated last year
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 2 months ago
epfml / DenseFormer
☆81Updated last year
cloneofsimo / ezmup
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆84Updated last year
sholtodouglas / scalingExperiments
☆61Updated 3 years ago
google-research / precondition
☆31Updated last month
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
google-research / jestimator
Amos optimizer with JEstimator lib.
☆82Updated last year
jxiw / BiGS
Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …
☆114Updated last year
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆140Updated last year
cat-state / tinypar
☆20Updated 2 years ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆54Updated last year
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆101Updated last week