stas00 / ml-ways
ML/DL Math and Method notes
☆57Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for ml-ways
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆85Updated last year
- Experiment of using Tangent to autodiff triton☆71Updated 9 months ago
- Collection of autoregressive model implementation☆66Updated this week
- Make triton easier☆41Updated 4 months ago
- Supercharge huggingface transformers with model parallelism.☆74Updated last month
- ☆76Updated 5 months ago
- Utilities for Training Very Large Models☆56Updated last month
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆117Updated 3 months ago
- LLM training in simple, raw C/CUDA☆12Updated last month
- TorchFix - a linter for PyTorch-using code with autofix support☆98Updated last month
- ☆72Updated 4 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆84Updated 2 months ago
- ☆76Updated 6 months ago
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…☆42Updated 9 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆83Updated last week
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.☆18Updated 4 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆80Updated 10 months ago
- Automatically take good care of your preemptible TPUs☆31Updated last year
- Utilities for PyTorch distributed☆23Updated last year
- ☆133Updated 9 months ago
- ☆35Updated 7 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆86Updated 3 months ago
- ☆20Updated last year
- 🤝 Trade any tensors over the network☆30Updated last year
- See https://github.com/cuda-mode/triton-index/ instead!☆11Updated 6 months ago
- ☆20Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 5 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 3 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆43Updated this week