huggingface / kernelsLinks
Load compute kernels from the Hub
β203Updated this week
Alternatives and similar repositories for kernels
Users that are interested in kernels are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β255Updated this week
- β112Updated last year
- This repository contains the experimental PyTorch native float8 training UXβ224Updated 11 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ138Updated 3 weeks ago
- ring-attention experimentsβ144Updated 8 months ago
- β160Updated last year
- A Quirky Assortment of CuTe Kernelsβ126Updated last week
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β78Updated 3 weeks ago
- π· Build compute kernelsβ74Updated this week
- β88Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on diskβ135Updated this week
- Triton-based implementation of Sparse Mixture of Experts.β224Updated 7 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ127Updated 7 months ago
- β198Updated 5 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS β¦β59Updated 9 months ago
- Fast low-bit matmul kernels in Tritonβ327Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β359Updated last week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β205Updated this week
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β151Updated this week
- β79Updated last year
- β225Updated this week
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofβ¦β138Updated 11 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β243Updated 5 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ188Updated last month
- The evaluation framework for training-free sparse attention in LLMsβ82Updated 3 weeks ago
- Applied AI experiments and examples for PyTorchβ281Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsityβ80Updated 10 months ago
- β214Updated 5 months ago
- β116Updated last month
- Efficient LLM Inference over Long Sequencesβ382Updated 2 weeks ago