huggingface / gpu-fryerLinks
Where GPUs get cooked π©βπ³π₯
β279Updated 3 weeks ago
Alternatives and similar repositories for gpu-fryer
Users that are interested in gpu-fryer are comparing it to the libraries listed below
Sorting:
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β389Updated 2 weeks ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ191Updated 2 months ago
- PyTorch Single Controllerβ368Updated this week
- Load compute kernels from the Hubβ258Updated this week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ562Updated this week
- Simple MPI implementation for prototyping or learningβ279Updated 3 weeks ago
- π· Build compute kernelsβ119Updated this week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β261Updated 3 weeks ago
- Dion optimizer algorithmβ318Updated last week
- Inference server benchmarking toolβ94Updated 4 months ago
- A tool to configure, launch and manage your machine learning experiments.β183Updated this week
- The Tensor (or Array)β441Updated last year
- β214Updated 6 months ago
- β163Updated last year
- Scalable and Performant Data Loadingβ291Updated last week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β571Updated 2 weeks ago
- Best practices & guides on how to write distributed pytorch training codeβ470Updated 6 months ago
- Google TPU optimizations for transformers modelsβ120Updated 7 months ago
- Slides, notes, and materials for the workshopβ331Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ266Updated 10 months ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β371Updated 2 months ago
- β238Updated this week
- A lightweight, local-first, and free experiment tracking Python library built on top of π€ Datasets and Spaces.β674Updated last week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β262Updated last month
- β217Updated 7 months ago
- β206Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β310Updated last week
- Simple & Scalable Pretraining for Neural Architecture Researchβ289Updated last week
- β527Updated last year
- β88Updated last year