huggingface / gpu-fryerLinks
Where GPUs get cooked π©βπ³π₯
β234Updated 3 months ago
Alternatives and similar repositories for gpu-fryer
Users that are interested in gpu-fryer are comparing it to the libraries listed below
Sorting:
- PyTorch Single Controllerβ231Updated this week
- PyTorch per step fault tolerance (actively under development)β329Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ185Updated 3 weeks ago
- Load compute kernels from the Hubβ191Updated last week
- Scalable and Performant Data Loadingβ278Updated this week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ399Updated 2 weeks ago
- β222Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!β46Updated this week
- β213Updated 5 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β311Updated last month
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ358Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β253Updated last week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ425Updated 3 weeks ago
- LLM KV cache compression made easyβ520Updated last week
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β137Updated 11 months ago
- β88Updated last year
- Fast low-bit matmul kernels in Tritonβ323Updated last week
- Efficient LLM Inference over Long Sequencesβ378Updated 3 weeks ago
- Cray-LM unified training and inference stack.β22Updated 4 months ago
- TorchFix - a linter for PyTorch-using code with autofix supportβ143Updated 4 months ago
- A tool to configure, launch and manage your machine learning experiments.β162Updated this week
- β126Updated last month
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Larβ¦β50Updated last week
- β183Updated this week
- Inference server benchmarking toolβ74Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ264Updated 8 months ago
- Google TPU optimizations for transformers modelsβ113Updated 5 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β556Updated last week
- Slides, notes, and materials for the workshopβ326Updated last year
- Applied AI experiments and examples for PyTorchβ277Updated 3 weeks ago