huggingface / kernelsLinks
Load compute kernels from the Hub
β389Updated last week
Alternatives and similar repositories for kernels
Users that are interested in kernels are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β279Updated 2 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β334Updated 3 months ago
- π· Build compute kernelsβ214Updated last week
- β230Updated 2 months ago
- This repository contains the experimental PyTorch native float8 training UXβ227Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ197Updated 8 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β219Updated this week
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face supportβ266Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β474Updated 3 weeks ago
- Applied AI experiments and examples for PyTorchβ315Updated 5 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ229Updated 7 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ238Updated this week
- ring-attention experimentsβ165Updated last year
- β178Updated 2 years ago
- Fast low-bit matmul kernels in Tritonβ427Updated this week
- Triton-based implementation of Sparse Mixture of Experts.β263Updated 4 months ago
- Efficient LLM Inference over Long Sequencesβ394Updated 7 months ago
- TPU inference for vLLM, with unified JAX and PyTorch support.β228Updated this week
- β286Updated this week
- Cataloging released Triton kernels.β291Updated 4 months ago
- β124Updated last year
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β596Updated 5 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β356Updated 2 weeks ago
- An extension of the nanoGPT repository for training small MOE models.β233Updated 10 months ago
- β579Updated 4 months ago
- Scalable and Performant Data Loadingβ364Updated this week
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β273Updated this week
- Official implementation for Training LLMs with MXFP4β118Updated 9 months ago
- β92Updated last year
- β147Updated this week