ELS-RD / kernlLinks
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
β1,588Updated last year
Alternatives and similar repositories for kernl
Users that are interested in kernl are comparing it to the libraries listed below
Sorting:
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.β1,071Updated last year
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,690Updated last year
- Library for 8-bit optimizers and quantization routines.β780Updated 3 years ago
- Cramming the training of a (BERT-type) language model into limited compute.β1,361Updated last year
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,090Updated 6 months ago
- Pipeline Parallelism for PyTorchβ785Updated last year
- An open-source efficient deep learning framework/compiler, written in python.β740Updated 4 months ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,247Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,701Updated 2 weeks ago
- maximal update parametrization (Β΅P)β1,662Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,116Updated this week
- Training and serving large-scale neural networks with auto parallelization.β3,173Updated 2 years ago
- PyTorch extensions for high performance and large scale training.β3,393Updated 9 months ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,590Updated last year
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Trainingβ1,857Updated this week
- The official implementation of βSophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-trainingββ981Updated last year
- SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, β¦β2,574Updated this week
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.β1,007Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,429Updated last year
- A pytorch quantization backend for optimumβ1,020Updated 2 months ago
- Automatically split your PyTorch models on multiple GPUs for training & inferenceβ657Updated 2 years ago
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorchβ877Updated 2 years ago
- β413Updated 2 years ago
- β551Updated last year
- Fast Inference Solutions for BLOOMβ566Updated last year
- Fast & Simple repository for pre-training and fine-tuning T5-style modelsβ1,018Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,315Updated 10 months ago
- Language Modeling with the H3 State Space Modelβ522Updated 2 years ago
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4β956Updated last month
- Microsoft Automatic Mixed Precision Libraryβ635Updated last month