ronghanghu / vit_10b_fsdp_exampleLinks

See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md

☆25

Alternatives and similar repositories for vit_10b_fsdp_example

Users that are interested in vit_10b_fsdp_example are comparing it to the libraries listed below

Sorting:

huggingface / m4-logs
M4 experiment logbook
☆57Updated 2 years ago
cloneofsimo / min-fsdp
☆91Updated last year
mgmalek / efficient_cross_entropy
☆121Updated last year
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year
jiasenlu / LL3M
LL3M: Large Language and Multi-Modal Model in Jax
☆74Updated last year
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆276Updated 3 years ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
rom1504 / gpu-tester
gpu tester detects broken and slow gpus in a cluster
☆72Updated 2 years ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆86Updated 3 years ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 3 months ago
apoorvkh / torchrunx
Easily run PyTorch on multiple GPUs & machines
☆47Updated 2 weeks ago
hadasah / btm
☆76Updated last year
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆69Updated last year
srush / do-we-need-attention
☆166Updated 2 years ago
huggingface / OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…
☆206Updated last year
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆159Updated last year
epfml / dynamic-sparse-flash-attention
☆149Updated 2 years ago
young-geng / mlxu
Machine Learning eXperiment Utilities
☆46Updated 2 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆233Updated last month
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆120Updated last year
samsja / muon_fsdp_2
Muon fsdp 2
☆44Updated 2 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated 11 months ago
kyo-takano / chinchilla
A toolkit for scaling law research ⚖
☆52Updated 9 months ago
SeanNaren / minGPT
A minimal PyTorch Lightning OpenAI GPT w DeepSpeed Training!
☆113Updated 2 years ago
rom1504 / python-template
Simple python template
☆42Updated last year
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 4 months ago
cloneofsimo / scaling-guide
WIP
☆93Updated last year
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆219Updated last year
Sea-Snell / JAXSeq
Train very large language models in Jax.
☆209Updated 2 years ago