ronghanghu / vit_10b_fsdp_example
See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md
☆23Updated last year
Related projects ⓘ
Alternatives and complementary repositories for vit_10b_fsdp_example
- ☆73Updated 4 months ago
- ☆77Updated 5 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- LL3M: Large Language and Multi-Modal Model in Jax☆65Updated 7 months ago
- M4 experiment logbook☆56Updated last year
- ☆29Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆72Updated 10 months ago
- A simple library for scaling up JAX programs☆127Updated 3 weeks ago
- Language models scale reliably with over-training and on downstream tasks☆94Updated 7 months ago
- gpu tester detects broken and slow gpus in a cluster☆67Updated last year
- Implementation of Infini-Transformer in Pytorch☆104Updated last month
- JAX bindings for Flash Attention v2☆80Updated 4 months ago
- ☆55Updated last month
- ☆20Updated last year
- ☆71Updated 6 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆71Updated last month
- Configuration with Dataclasses+YAML+Argparse. Fork of Pyrallis☆18Updated last week
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆54Updated 3 months ago
- A set of Python scripts that makes your experience on TPU better☆40Updated 4 months ago
- A library for unit scaling in PyTorch☆105Updated 2 weeks ago
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆110Updated 8 months ago
- HomebrewNLP in JAX flavour for maintable TPU-Training☆46Updated 10 months ago
- ☆57Updated 2 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆113Updated 7 months ago
- A minimal PyTorch Lightning OpenAI GPT w DeepSpeed Training!☆110Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 5 months ago
- ☆53Updated 3 weeks ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆152Updated 7 months ago