huggingface / kernelsLinks
Load compute kernels from the Hub
β290Updated last week
Alternatives and similar repositories for kernels
Users that are interested in kernels are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β268Updated 2 months ago
- π· Build compute kernelsβ149Updated this week
- β221Updated 7 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β280Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ193Updated 4 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β410Updated this week
- β122Updated last year
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- ring-attention experimentsβ152Updated 11 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ189Updated 3 months ago
- Scalable and Performant Data Loadingβ304Updated last week
- β173Updated last year
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β213Updated this week
- An extension of the nanoGPT repository for training small MOE models.β195Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.β241Updated last month
- Memory optimized Mixture of Expertsβ67Updated 2 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β218Updated last week
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β89Updated 2 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ164Updated this week
- Efficient LLM Inference over Long Sequencesβ391Updated 3 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ130Updated 10 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β338Updated 5 months ago
- Normalized Transformer (nGPT)β191Updated 10 months ago
- PyTorch Single Controllerβ425Updated this week
- Google TPU optimizations for transformers modelsβ120Updated 8 months ago
- The evaluation framework for training-free sparse attention in LLMsβ98Updated 3 months ago
- β527Updated last week
- Learn CUDA with PyTorchβ84Updated last week
- Fast low-bit matmul kernels in Tritonβ373Updated last week
- β89Updated last year