huggingface / gpu-fryerLinks

Where GPUs get cooked 👩‍🍳🔥

☆317

Alternatives and similar repositories for gpu-fryer

Users that are interested in gpu-fryer are comparing it to the libraries listed below

Sorting:

meta-pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆455Updated last week
huggingface / kernels
Load compute kernels from the Hub
☆327Updated last week
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 5 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆178Updated last week
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆305Updated 3 weeks ago
meta-pytorch / monarch
PyTorch Single Controller
☆901Updated this week
vdesai2014 / inference-optimization-blog-post
☆89Updated last year
facebookresearch / spdl
Scalable and Performant Data Loading
☆335Updated this week
huggingface / picotron_tutorial
☆225Updated last month
microsoft / dion
Dion optimizer algorithm
☆384Updated last week
Quentin-Anthony / nanoMPI
Simple MPI implementation for prototyping or learning
☆289Updated 3 months ago
gpu-mode / profiling-cuda-in-torch
☆178Updated last year
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆272Updated 2 weeks ago
huggingface / inference-benchmarker
Inference server benchmarking tool
☆128Updated last month
NVIDIA-NeMo / Run
A tool to configure, launch and manage your machine learning experiments.
☆208Updated this week
LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆540Updated last month
jax-ml / scaling-book
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
☆691Updated last week
run-ai / runai-model-streamer
☆267Updated this week
apple / ml-recurrent-drafter
☆218Updated 10 months ago
cornstarch-org / Cornstarch
☆112Updated 2 months ago
meta-pytorch / torchforge
PyTorch-native post-training at scale
☆546Updated this week
pyember / ember
☆233Updated 4 months ago
divyamakkar0 / JAXformer
A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.
☆104Updated last month
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆151Updated 2 years ago
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆583Updated 3 months ago
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆216Updated this week
mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆334Updated last year
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆300Updated 3 weeks ago
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆111Updated last week
facebookresearch / optimizers
For optimization algorithm research and development.
☆547Updated last week