huggingface / kernelsLinks
Load compute kernels from the Hub
β347Updated this week
Alternatives and similar repositories for kernels
Users that are interested in kernels are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β271Updated 3 weeks ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β321Updated last month
- π· Build compute kernelsβ193Updated this week
- β225Updated 3 weeks ago
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face supportβ202Updated last week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β456Updated last week
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ222Updated 6 months ago
- This repository contains the experimental PyTorch native float8 training UXβ227Updated last year
- ring-attention experimentsβ160Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ195Updated 6 months ago
- Applied AI experiments and examples for PyTorchβ311Updated 3 months ago
- Memory optimized Mixture of Expertsβ69Updated 4 months ago
- Scalable and Performant Data Loadingβ352Updated this week
- β121Updated last year
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β216Updated this week
- Learn CUDA with PyTorchβ124Updated 3 weeks ago
- Triton-based implementation of Sparse Mixture of Experts.β253Updated 2 months ago
- β177Updated last year
- Efficient LLM Inference over Long Sequencesβ393Updated 5 months ago
- Fast low-bit matmul kernels in Triton