ELS-RD / kernlLinks
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
β1,572Updated last year
Alternatives and similar repositories for kernl
Users that are interested in kernl are comparing it to the libraries listed below
Sorting:
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.β1,053Updated last year
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,688Updated 8 months ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,131Updated last year
- Library for 8-bit optimizers and quantization routines.β715Updated 2 years ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,020Updated 3 months ago
- Cramming the training of a (BERT-type) language model into limited compute.β1,336Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,427Updated 11 months ago
- Pipeline Parallelism for PyTorchβ768Updated 10 months ago
- maximal update parametrization (Β΅P)β1,544Updated 11 months ago
- An open-source efficient deep learning framework/compiler, written in python.β704Updated last week
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,258Updated 3 months ago
- Training and serving large-scale neural networks with auto parallelization.β3,138Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,507Updated last week
- PyTorch extensions for high performance and large scale training.β3,335Updated 2 months ago
- β543Updated 6 months ago
- Fast Inference Solutions for BLOOMβ564Updated 8 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,397Updated last year
- The official implementation of βSophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-trainingββ963Updated last year
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorchβ866Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantizationβ692Updated 10 months ago
- Foundation Architecture for (M)LLMsβ3,083Updated last year
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.β1,000Updated 10 months ago
- Parallelformers: An Efficient Model Parallelization Toolkit for Deploymentβ790Updated 2 years ago
- Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4β844Updated this week
- Fast & Simple repository for pre-training and fine-tuning T5-style modelsβ1,005Updated 10 months ago
- Language Modeling with the H3 State Space Modelβ519Updated last year
- A pytorch quantization backend for optimumβ955Updated last week
- Microsoft Automatic Mixed Precision Libraryβ610Updated 8 months ago
- Transformer related optimization, including BERT, GPTβ6,219Updated last year
- β411Updated last year