huggingface / gpu-fryerLinks
Where GPUs get cooked π©βπ³π₯
β236Updated 4 months ago
Alternatives and similar repositories for gpu-fryer
Users that are interested in gpu-fryer are comparing it to the libraries listed below
Sorting:
- PyTorch Single Controllerβ318Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ188Updated last month
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β361Updated this week
- Load compute kernels from the Hubβ207Updated this week
- Scalable and Performant Data Loadingβ288Updated last week
- β161Updated last year
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ430Updated last week
- Simple MPI implementation for prototyping or learningβ263Updated 3 weeks ago
- β214Updated 5 months ago
- Google TPU optimizations for transformers modelsβ116Updated 5 months ago
- Inference server benchmarking toolβ83Updated 2 months ago
- β96Updated last month
- β88Updated last year
- β200Updated 5 months ago
- β228Updated last week
- β186Updated this week
- A tool to configure, launch and manage your machine learning experiments.β171Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β256Updated last week
- Best practices & guides on how to write distributed pytorch training codeβ450Updated 4 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β46Updated this week
- An extension of the nanoGPT repository for training small MOE models.β162Updated 4 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.β378Updated 4 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β66Updated 3 months ago
- π· Build compute kernelsβ77Updated this week
- Cray-LM unified training and inference stack.β22Updated 5 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β135Updated last year
- DeMo: Decoupled Momentum Optimizationβ189Updated 7 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β318Updated 2 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β156Updated this week
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuningβ46Updated 4 months ago