huggingface / gpu-fryerLinks
Where GPUs get cooked π©βπ³π₯
β326Updated 2 months ago
Alternatives and similar repositories for gpu-fryer
Users that are interested in gpu-fryer are comparing it to the libraries listed below
Sorting:
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β456Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ196Updated 6 months ago
- Load compute kernels from the Hubβ348Updated last week
- π· Build compute kernelsβ192Updated last week
- PyTorch Single Controllerβ921Updated this week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β321Updated last month
- β225Updated 3 weeks ago
- β177Updated last year
- Simple MPI implementation for prototyping or learningβ292Updated 4 months ago
- Google TPU optimizations for transformers modelsβ124Updated 10 months ago
- Inference server benchmarking toolβ130Updated 2 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β64Updated 2 weeks ago
- Dion optimizer algorithmβ403Updated last week
- Best practices & guides on how to write distributed pytorch training codeβ552Updated last month
- Scalable and Performant Data Loadingβ352Updated this week
- β113Updated 3 months ago
- β219Updated 10 months ago
- Simple & Scalable Pretraining for Neural Architecture Researchβ304Updated last week
- SIMD quantization kernelsβ93Updated 3 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β153Updated 2 years ago
- β90Updated last year
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.β105Updated 2 months ago
- Learn GPU Programming in Mojoπ₯ by Solving Puzzlesβ254Updated this week
- β234Updated 5 months ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β271Updated 3 weeks ago
- β270Updated 2 weeks ago
- A tool to configure, launch and manage your machine learning experiments.β210Updated this week
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IPβ141Updated 3 months ago
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ724Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated last week