huggingface / gpu-fryerLinks
Where GPUs get cooked π©βπ³π₯
β285Updated 3 weeks ago
Alternatives and similar repositories for gpu-fryer
Users that are interested in gpu-fryer are comparing it to the libraries listed below
Sorting:
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β414Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ194Updated 4 months ago
- PyTorch Single Controllerβ435Updated this week
- Load compute kernels from the Hubβ293Updated last week
- π· Build compute kernelsβ155Updated this week
- Inference server benchmarking toolβ113Updated last week
- β89Updated last year
- Simple MPI implementation for prototyping or learningβ284Updated 2 months ago
- β173Updated last year
- β222Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β290Updated 2 months ago
- Slides, notes, and materials for the workshopβ332Updated last year
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.β95Updated 2 weeks ago
- Scalable and Performant Data Loadingβ304Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β269Updated 2 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β341Updated 5 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β144Updated last year
- SIMD quantization kernelsβ87Updated last month
- Dion optimizer algorithmβ361Updated last week
- Cray-LM unified training and inference stack.β22Updated 8 months ago
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ650Updated this week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.β420Updated 7 months ago
- β255Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!β58Updated 2 weeks ago
- Best practices & guides on how to write distributed pytorch training codeβ494Updated this week
- Simple & Scalable Pretraining for Neural Architecture Researchβ296Updated last month
- A tool to configure, launch and manage your machine learning experiments.β197Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inferenceβ270Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ265Updated last year
- β106Updated last month