ELS-RD / kernlLinks
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
β1,585Updated last year
Alternatives and similar repositories for kernl
Users that are interested in kernl are comparing it to the libraries listed below
Sorting:
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.β1,071Updated last year
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,689Updated last year
- Library for 8-bit optimizers and quantization routines.β781Updated 3 years ago
- Pipeline Parallelism for PyTorchβ784Updated last year
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,085Updated 6 months ago
- maximal update parametrization (Β΅P)β1,650Updated last year
- Cramming the training of a (BERT-type) language model into limited compute.β1,357Updated last year
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.β1,007Updated last year
- An open-source efficient deep learning framework/compiler, written in python.β737Updated 4 months ago
- PyTorch extensions for high performance and large scale training.β3,394Updated 8 months ago
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Trainingβ1,854Updated 2 weeks ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,043Updated last week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,693Updated 2 weeks ago
- Parallelformers: An Efficient Model Parallelization Toolkit for Deploymentβ792Updated 2 years ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,242Updated last year
- SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, β¦β2,561Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,426Updated last year
- The official implementation of βSophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-trainingββ981Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,576Updated last year
- Automatically split your PyTorch models on multiple GPUs for training & inferenceβ657Updated 2 years ago
- Training and serving large-scale neural networks with auto parallelization.β3,173Updated 2 years ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β833Updated 4 months ago
- β413Updated 2 years ago
- A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.β1,245Updated last week
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorchβ876Updated 2 years ago
- A pytorch quantization backend for optimumβ1,020Updated last month
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,312Updated 10 months ago
- Fast Inference Solutions for BLOOMβ565Updated last year
- A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.β912Updated 2 weeks ago
- Foundation Architecture for (M)LLMsβ3,128Updated last year