huggingface / kernelsLinks
Load compute kernels from the Hub
β389Updated last week
Alternatives and similar repositories for kernels
Users that are interested in kernels are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β279Updated 2 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β334Updated 3 months ago
- π· Build compute kernelsβ214Updated last week
- β232Updated 2 months ago
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face supportβ266Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ197Updated 8 months ago
- β579Updated 4 months ago
- ring-attention experimentsβ165Updated last year
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β475Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ227Updated last year
- Efficient LLM Inference over Long Sequencesβ394Updated 7 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β219Updated this week
- Triton-based implementation of Sparse Mixture of Experts.β263Updated 4 months ago
- TPU inference for vLLM, with unified JAX and PyTorch support.β228Updated this week
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ233Updated 7 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ238Updated this week
- An extension of the nanoGPT repository for training small MOE models.β233Updated 10 months ago
- β178Updated 2 years ago
- Applied AI experiments and examples for PyTorchβ315Updated 5 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β595Updated 5 months ago
- Normalized Transformer (nGPT)β198Updated last year
- β124Updated last year
- A bunch of kernels that might make stuff slower πβ75Updated last week
- Accelerating MoE with IO and Tile-aware Optimizationsβ569Updated 2 weeks ago
- β286Updated this week
- Fast low-bit matmul kernels in Tritonβ427Updated this week
- Scalable and Performant Data Loadingβ364Updated this week
- Cataloging released Triton kernels.β292Updated 4 months ago
- Memory optimized Mixture of Expertsβ73Updated 6 months ago
- β92Updated last year