huggingface / kernelsLinks
Load compute kernels from the Hub
β271Updated this week
Alternatives and similar repositories for kernels
Users that are interested in kernels are comparing it to the libraries listed below
Sorting:
- π· Build compute kernelsβ136Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β265Updated last month
- β118Updated last year
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β269Updated last month
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ181Updated 2 months ago
- β216Updated 6 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β395Updated 2 weeks ago
- ring-attention experimentsβ150Updated 10 months ago
- Triton-based implementation of Sparse Mixture of Experts.β238Updated 2 weeks ago
- PyTorch Single Controllerβ393Updated last week
- A safetensors extension to efficiently store sparse quantized tensors on diskβ156Updated last week
- β167Updated last year
- The evaluation framework for training-free sparse attention in LLMsβ91Updated 2 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ193Updated 3 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β209Updated last week
- Applied AI experiments and examples for PyTorchβ295Updated 3 weeks ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofβ¦β145Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ129Updated 9 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS β¦β60Updated 11 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.β86Updated last month
- Memory optimized Mixture of Expertsβ62Updated last month
- Cataloging released Triton kernels.β252Updated this week
- Fast low-bit matmul kernels in Tritonβ357Updated this week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β245Updated 7 months ago
- Google TPU optimizations for transformers modelsβ120Updated 7 months ago
- Code for studying the super weight in LLMβ117Updated 9 months ago
- Efficient LLM Inference over Long Sequencesβ391Updated 2 months ago
- LLM KV cache compression made easyβ604Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β575Updated last month