huggingface / gpu-fryerLinks
Where GPUs get cooked π©βπ³π₯
β282Updated last week
Alternatives and similar repositories for gpu-fryer
Users that are interested in gpu-fryer are comparing it to the libraries listed below
Sorting:
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β401Updated 3 weeks ago
- π· Build compute kernelsβ143Updated this week
- PyTorch Single Controllerβ419Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ193Updated 3 months ago
- Load compute kernels from the Hubβ283Updated this week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β274Updated last month
- β217Updated 7 months ago
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ605Updated last week
- Simple MPI implementation for prototyping or learningβ279Updated last month
- Inference server benchmarking toolβ100Updated 4 months ago
- β171Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!β57Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β265Updated last month
- Scalable and Performant Data Loadingβ302Updated this week
- Best practices & guides on how to write distributed pytorch training codeβ478Updated 6 months ago
- β89Updated last year
- Dion optimizer algorithmβ343Updated 2 weeks ago
- β217Updated 7 months ago
- Slides, notes, and materials for the workshopβ331Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β142Updated last year
- An extension of the nanoGPT repository for training small MOE models.β187Updated 6 months ago
- Google TPU optimizations for transformers modelsβ120Updated 8 months ago
- For optimization algorithm research and development.β538Updated this week
- TorchFix - a linter for PyTorch-using code with autofix supportβ147Updated 3 weeks ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.β408Updated 6 months ago
- β223Updated 2 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β335Updated 4 months ago
- β104Updated 2 weeks ago
- Simple & Scalable Pretraining for Neural Architecture Researchβ293Updated 3 weeks ago
- code for training & evaluating Contextual Document Embedding modelsβ197Updated 4 months ago