ElanaPearl / pytorch-mps-noncontiguous-bugLinks
☆28Updated 2 months ago
Alternatives and similar repositories for pytorch-mps-noncontiguous-bug
Users that are interested in pytorch-mps-noncontiguous-bug are comparing it to the libraries listed below
Sorting:
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆462Updated this week
- ☆548Updated last year
- ☆219Updated 11 months ago
- ☆291Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆153Updated 2 years ago
- ☆287Updated last year
- 🧱 Modula software package☆316Updated 4 months ago
- ☆318Updated last year
- For optimization algorithm research and development.☆555Updated last week
- Where GPUs get cooked 👩🍳🔥☆343Updated 3 months ago
- ☆273Updated last week
- ☆43Updated 3 weeks ago
- Scalable and Performant Data Loading☆356Updated last week
- PyTorch Single Controller☆932Updated last week
- A library to analyze PyTorch traces.☆452Updated 2 weeks ago
- Simple Transformer in Jax☆141Updated last year
- Dion optimizer algorithm☆411Updated last week
- MoE training for Me and You and maybe other people☆298Updated 2 weeks ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆397Updated 6 months ago
- Solve puzzles. Learn CUDA.☆64Updated 2 years ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆336Updated last month
- Puzzles for exploring transformers☆380Updated 2 years ago
- seqax = sequence modeling + JAX☆169Updated 5 months ago
- ☆536Updated 4 months ago
- ☆234Updated 6 months ago
- ring-attention experiments☆160Updated last year
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆153Updated last month
- ☆178Updated last year
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆697Updated this week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆791Updated this week