sholtodouglas / multihost_dataloadingLinks

Experimenting with how best to do multi-host dataloading

☆10

Alternatives and similar repositories for multihost_dataloading

Users that are interested in multihost_dataloading are comparing it to the libraries listed below

Sorting:

sholtodouglas / scalingExperiments
☆62Updated 3 years ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
fattorib / ZeRO-transformer
Two implementations of ZeRO-1 optimizer sharding in JAX
☆14Updated 2 years ago
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆35Updated last year
young-geng / mlxu
Machine Learning eXperiment Utilities
☆46Updated 2 months ago
cloneofsimo / min-fsdp
☆91Updated last year
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆37Updated 2 years ago
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 4 months ago
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆51Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated last year
lucaslingle / mu_transformer
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆32Updated 4 months ago
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆60Updated this week
Sea-Snell / JAXSeq
Train very large language models in Jax.
☆209Updated last year
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated last month
cloneofsimo / zeroshampoo
☆34Updated last year
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆85Updated 3 years ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆88Updated last year
srush / drop7
☆18Updated last year
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
Sea-Snell / JAX_llama
Inference code for LLaMA models in JAX
☆119Updated last year
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆52Updated last year
sustcsonglin / mamba-triton
☆48Updated last year
google-research / jestimator
Amos optimizer with JEstimator lib.
☆82Updated last year
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
ml-gde / jflux
JAX Implementation of Black Forest Labs' Flux.1 family of models
☆39Updated last month
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago