huggingface / gpu-fryer
Where GPUs get cooked π©βπ³π₯
β225Updated last month
Alternatives and similar repositories for gpu-fryer:
Users that are interested in gpu-fryer are comparing it to the libraries listed below
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ178Updated this week
- PyTorch per step fault tolerance (actively under development)β284Updated this week
- β169Updated 2 months ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β240Updated last week
- An extension of the nanoGPT repository for training small MOE models.β131Updated last month
- Scalable and Performant Data Loadingβ237Updated last week
- β208Updated 3 months ago
- β87Updated last year
- Load compute kernels from the Hubβ115Updated this week
- β196Updated 3 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β40Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ165Updated last month
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β286Updated last week
- Inference server benchmarking toolβ53Updated 3 weeks ago
- Learning about CUDA by writing PTX code.β128Updated last year
- β153Updated last year
- LLM KV cache compression made easyβ458Updated last week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ246Updated this week
- β171Updated this week
- A tool to configure, launch and manage your machine learning experiments.β139Updated this week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β65Updated last month
- Google TPU optimizations for transformers modelsβ108Updated 3 months ago
- Fast low-bit matmul kernels in Tritonβ291Updated this week
- Best practices & guides on how to write distributed pytorch training codeβ401Updated 2 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.β337Updated last month
- An implementation of PSGD Kron second-order optimizer for PyTorchβ89Updated 3 weeks ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β130Updated last year
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β534Updated this week
- Applied AI experiments and examples for PyTorchβ261Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 6 months ago