Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
β1,585Jan 28, 2026Updated last month
Alternatives and similar repositories for kernl
Users that are interested in kernl are comparing it to the libraries listed below
Sorting:
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,688Oct 23, 2024Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,706Feb 27, 2026Updated last week
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.β1,075Apr 17, 2024Updated last year
- Transformer related optimization, including BERT, GPTβ6,398Mar 27, 2024Updated last year
- Accessible large language models via k-bit quantization for PyTorch.β8,019Updated this week
- Development repository for the Triton language and compilerβ18,501Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,176Feb 28, 2026Updated last week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,097Jun 30, 2025Updated 8 months ago
- Running large language models on a single GPU for throughput-oriented scenarios.β9,382Oct 28, 2024Updated last year
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β595Aug 12, 2025Updated 6 months ago
- Training and serving large-scale neural networks with auto parallelization.β3,184Dec 9, 2023Updated 2 years ago
- Hackable and optimized Transformers building blocks, supporting a composable construction.β10,356Feb 20, 2026Updated 2 weeks ago
- GPTQ inference Triton kernelβ321May 18, 2023Updated 2 years ago
- Fast and memory-efficient exact attentionβ22,460Updated this week
- An open-source efficient deep learning framework/compiler, written in python.β737Sep 4, 2025Updated 6 months ago
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Trainingβ1,863Updated this week
- FlashInfer: Kernel Library for LLM Servingβ5,057Updated this week
- Pipeline Parallelism for PyTorchβ786Aug 21, 2024Updated last year
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,305Feb 9, 2026Updated 3 weeks ago
- Serving multiple LoRA finetuned LLM as oneβ1,144May 8, 2024Updated last year
- Tile primitives for speedy kernelsβ3,202Feb 24, 2026Updated last week
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,738Jan 8, 2024Updated 2 years ago
- Foundation Architecture for (M)LLMsβ3,135Apr 11, 2024Updated last year
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,528Updated this week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,443Jul 17, 2025Updated 7 months ago
- PyTorch extensions for high performance and large scale training.β3,400Apr 26, 2025Updated 10 months ago
- jax-triton contains integrations between JAX and OpenAI Tritonβ439Feb 27, 2026Updated last week
- Sparsity-aware deep learning inference runtime for CPUsβ3,163Jun 2, 2025Updated 9 months ago
- Efficient Triton Kernels for LLM Trainingβ6,189Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.β6,184Aug 22, 2025Updated 6 months ago
- Cramming the training of a (BERT-type) language model into limited compute.β1,363Jun 13, 2024Updated last year
- PyTorch native quantization and sparsity for training and inferenceβ2,707Updated this week
- CUDA Templates and Python DSLs for High-Performance Linear Algebraβ9,348Updated this week
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliβ¦β3,919Feb 28, 2026Updated last week
- Minimalistic large language model 3D-parallelism trainingβ2,579Feb 19, 2026Updated 2 weeks ago
- Language Modeling with the H3 State Space Modelβ522Sep 29, 2023Updated 2 years ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,262Mar 27, 2024Updated last year
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Headsβ2,710Jun 25, 2024Updated last year
- Large Language Model Text Generation Inferenceβ10,788Jan 8, 2026Updated last month