Build compute kernels and load them from the Hub.
β676Jun 5, 2026Updated this week
Alternatives and similar repositories for kernels
Users that are interested in kernels are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π· Build compute kernelsβ213Apr 6, 2026Updated 2 months ago
- A Quirky Assortment of CuTe Kernelsβ994May 30, 2026Updated last week
- Kernel sources for https://huggingface.co/kernels-communityβ125Updated this week
- Hugging Face Jobsβ20Jul 11, 2025Updated 10 months ago
- Minimalistic large language model 3D-parallelism trainingβ2,711May 26, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β29May 26, 2026Updated 2 weeks ago
- Efficient Triton Kernels for LLM Trainingβ6,415Updated this week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β875Updated this week
- FlashInfer: Kernel Library for LLM Servingβ5,760Updated this week
- Tile primitives for speedy kernelsβ3,405May 27, 2026Updated last week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)β1,045Mar 24, 2026Updated 2 months ago
- [ICLR'25] Code for KaSA, an official implementation of "KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models"β22Jan 16, 2025Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.β358Updated this week
- Automatically derive Python dunder methods for your Rust codeβ26May 26, 2026Updated 2 weeks ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- β210May 5, 2025Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purposeβ2,216Aug 26, 2025Updated 9 months ago
- Applied AI experiments and examples for PyTorchβ323Aug 22, 2025Updated 9 months ago
- Framework to reduce autotune overhead to zero for well known deployments.β101Sep 19, 2025Updated 8 months ago
- π Efficient implementations for emerging model architecturesβ5,182Updated this week
- β52May 19, 2025Updated last year
- Fast low-bit matmul kernels in Tritonβ467May 15, 2026Updated 3 weeks ago
- PyTorch native quantization and sparsity for training and inferenceβ2,847Updated this week
- FlexAttention w/ FlashAttention3 Supportβ27Oct 5, 2024Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β1,082Sep 4, 2024Updated last year
- Distributed Compiler based on Triton for Parallel Systemsβ1,455Apr 22, 2026Updated last month
- β32Jul 2, 2025Updated 11 months ago
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernelβ2,293Updated this week
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ251Jun 15, 2025Updated 11 months ago
- A PyTorch native platform for training generative AI modelsβ5,416Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,381Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ2,437May 29, 2026Updated last week
- Helpful tools and examples for working with flex-attentionβ1,195May 28, 2026Updated last week
- End-to-end encrypted cloud storage - Proton Drive β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- β267Jul 11, 2024Updated last year
- A pytorch quantization backend for optimumβ1,042Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernelsβ6,434Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β337May 26, 2026Updated 2 weeks ago
- β57Feb 24, 2026Updated 3 months ago
- Quantized Attention on GPUβ44Nov 22, 2024Updated last year
- β14Dec 21, 2025Updated 5 months ago