huggingface / kernelsLinks
Load compute kernels from the Hub
β304Updated this week
Alternatives and similar repositories for kernels
Users that are interested in kernels are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β270Updated 2 months ago
- π· Build compute kernelsβ163Updated this week
- β222Updated 3 weeks ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β296Updated 2 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β420Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ223Updated last year
- β121Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ195Updated 4 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ193Updated 4 months ago
- Triton-based implementation of Sparse Mixture of Experts.β246Updated 3 weeks ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β215Updated this week
- ring-attention experimentsβ154Updated last year
- Fast low-bit matmul kernels in Tritonβ381Updated 3 weeks ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ180Updated this week
- Efficient LLM Inference over Long Sequencesβ390Updated 3 months ago
- Applied AI experiments and examples for PyTorchβ299Updated 2 months ago
- β174Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β344Updated 5 months ago
- Cataloging released Triton kernels.β263Updated last month
- β534Updated last month
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β248Updated 8 months ago
- Normalized Transformer (nGPT)β192Updated 11 months ago
- An extension of the nanoGPT repository for training small MOE models.β202Updated 7 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β578Updated 2 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β91Updated 3 months ago
- β240Updated this week
- LLM KV cache compression made easyβ660Updated last week
- Pytorch DTensor native training library for LLMs/VLMs with OOTB Hugging Face supportβ135Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ130Updated 10 months ago
- Scalable and Performant Data Loadingβ311Updated this week