huggingface / gpu-fryer
Where GPUs get cooked π©βπ³π₯
β218Updated 3 weeks ago
Alternatives and similar repositories for gpu-fryer:
Users that are interested in gpu-fryer are comparing it to the libraries listed below
- PyTorch per step fault tolerance (actively under development)β271Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ169Updated last week
- β87Updated last year
- β152Updated last year
- Scalable and Performant Data Loadingβ231Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β234Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ154Updated last week
- β204Updated 2 months ago
- β158Updated last month
- High-Performance SGEMM on CUDA devicesβ88Updated 2 months ago
- Inference server benchmarking toolβ38Updated this week
- Learning about CUDA by writing PTX code.β125Updated last year
- Load compute kernels from the Hubβ107Updated this week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.β318Updated 3 weeks ago
- Best practices & guides on how to write distributed pytorch training codeβ383Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 5 months ago
- Fast low-bit matmul kernels in Tritonβ275Updated this week
- Applied AI experiments and examples for PyTorchβ251Updated last week
- A tool to configure, launch and manage your machine learning experiments.β133Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ222Updated 8 months ago
- Google TPU optimizations for transformers modelsβ104Updated 2 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β35Updated this week
- LLM KV cache compression made easyβ444Updated 2 weeks ago
- β192Updated this week
- Cray-LM unified training and inference stack.β21Updated 2 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β62Updated last week
- β47Updated this week
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β127Updated 3 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β524Updated last month
- Slides, notes, and materials for the workshopβ321Updated 10 months ago