huggingface / gpu-fryerLinks
Where GPUs get cooked π©βπ³π₯
β317Updated 2 months ago
Alternatives and similar repositories for gpu-fryer
Users that are interested in gpu-fryer are comparing it to the libraries listed below
Sorting:
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β455Updated last week
- Load compute kernels from the Hubβ327Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ196Updated 5 months ago
- π· Build compute kernelsβ178Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β305Updated 3 weeks ago
- PyTorch Single Controllerβ901Updated this week
- β89Updated last year
- Scalable and Performant Data Loadingβ335Updated this week
- β225Updated last month
- Dion optimizer algorithmβ384Updated last week
- Simple MPI implementation for prototyping or learningβ289Updated 3 months ago
- β178Updated last year
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β272Updated 2 weeks ago
- Inference server benchmarking toolβ128Updated last month
- A tool to configure, launch and manage your machine learning experiments.β208Updated this week
- Best practices & guides on how to write distributed pytorch training codeβ540Updated last month
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ691Updated last week
- β267Updated this week
- β218Updated 10 months ago
- β112Updated 2 months ago
- PyTorch-native post-training at scaleβ546Updated this week
- β233Updated 4 months ago
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.β104Updated last month
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β151Updated 2 years ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β583Updated 3 months ago
- Quantized LLM training in pure CUDA/C++.β216Updated this week
- Slides, notes, and materials for the workshopβ334Updated last year
- Simple & Scalable Pretraining for Neural Architecture Researchβ300Updated 3 weeks ago
- Learn CUDA with PyTorchβ111Updated last week
- For optimization algorithm research and development.β547Updated last week