ELS-RD / kernlLinks

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

☆1,585

Alternatives and similar repositories for kernl

Users that are interested in kernl are comparing it to the libraries listed below

Sorting:

pytorch / torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
☆1,066Updated last year
ELS-RD / transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
☆1,689Updated last year
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,073Updated 4 months ago
facebookresearch / bitsandbytes
Library for 8-bit optimizers and quantization routines.
☆780Updated 3 years ago
pytorch / PiPPy
Pipeline Parallelism for PyTorch
☆780Updated last year
JonasGeiping / cramming
Cramming the training of a (BERT-type) language model into limited compute.
☆1,350Updated last year
hidet-org / hidet
An open-source efficient deep learning framework/compiler, written in python.
☆733Updated 2 months ago
microsoft / mup
maximal update parametrization (µP)
☆1,613Updated last year
IST-DASLab / gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,212Updated last year
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,883Updated this week
bigscience-workshop / bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
☆1,006Updated last year
flexflow / flexflow-train
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,841Updated this week
facebookresearch / fairscale
PyTorch extensions for high performance and large scale training.
☆3,385Updated 6 months ago
alpa-projects / alpa
Training and serving large-scale neural networks with auto parallelization.
☆3,160Updated last year
tunib-ai / parallelformers
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
☆790Updated 2 years ago
facebookincubator / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆4,689Updated last week
Liuhong99 / Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
☆977Updated last year
meta-pytorch / data
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
☆1,231Updated this week
lucidrains / RETRO-pytorch
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
☆875Updated 2 years ago
mit-han-lab / smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆1,543Updated last year
BlackSamorez / tensor_parallel
Automatically split your PyTorch models on multiple GPUs for training & inference
☆658Updated last year
microsoft / torchscale
Foundation Architecture for (M)LLMs
☆3,119Updated last year
bigscience-workshop / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,425Updated last year
pytorch / kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆883Updated last week
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆1,004Updated 2 weeks ago
huggingface / transformers-bloom-inference
Fast Inference Solutions for BLOOM
☆565Updated last year
punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆1,110Updated last year
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,295Updated 8 months ago
triton-inference-server / fastertransformer_backend
☆413Updated last year
pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆2,489Updated this week