huggingface / kernelsLinks
Load compute kernels from the Hub
β214Updated last week
Alternatives and similar repositories for kernels
Users that are interested in kernels are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β258Updated last week
- β113Updated last year
- π· Build compute kernelsβ79Updated this week
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ147Updated last month
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ127Updated 7 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ141Updated this week
- Triton-based implementation of Sparse Mixture of Experts.β227Updated 8 months ago
- β162Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β82Updated 2 weeks ago
- ring-attention experimentsβ145Updated 9 months ago
- β88Updated last year
- β203Updated 5 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β244Updated 6 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β366Updated last week
- β228Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ188Updated 2 months ago
- Experiment of using Tangent to autodiff tritonβ79Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS β¦β61Updated 9 months ago
- Fast low-bit matmul kernels in Tritonβ338Updated this week
- β82Updated last year
- The evaluation framework for training-free sparse attention in LLMsβ85Updated last month
- β227Updated this week
- Code for studying the super weight in LLMβ114Updated 7 months ago
- Applied AI experiments and examples for PyTorchβ289Updated 2 months ago
- β122Updated 2 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β207Updated last week
- JAX bindings for Flash Attention v2β90Updated this week
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofβ¦β138Updated 11 months ago
- Efficient LLM Inference over Long Sequencesβ385Updated last month