huggingface / optimumLinks
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β3,176Updated this week
Alternatives and similar repositories for optimum
Users that are interested in optimum are comparing it to the libraries listed below
Sorting:
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,074Updated 4 months ago
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,359Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β2,925Updated this week
- Accessible large language models via k-bit quantization for PyTorch.β7,755Updated this week
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,164Updated last year
- PyTorch native quantization and sparsity for training and inferenceβ2,511Updated this week
- Simple, safe way to store and distribute tensorsβ3,511Updated this week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,347Updated 4 months ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,219Updated last year
- SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, β¦β2,525Updated this week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,688Updated last year
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,269Updated 6 months ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,289Updated last week
- PyTorch extensions for high performance and large scale training.β3,385Updated 6 months ago
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,585Updated last year
- A pytorch quantization backend for optimumβ1,009Updated 3 weeks ago
- Transformer related optimization, including BERT, GPTβ6,348Updated last year
- Minimalistic large language model 3D-parallelism trainingβ2,323Updated 2 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,868Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,726Updated this week
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Headsβ2,660Updated last year
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,989Updated 7 months ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,875Updated last year
- PyTorch native post-training libraryβ5,595Updated this week
- Fast inference engine for Transformer modelsβ4,142Updated last week
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliβ¦β3,730Updated this week
- Large Language Model Text Generation Inferenceβ10,656Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMβ2,238Updated this week
- Sparsity-aware deep learning inference runtime for CPUsβ3,161Updated 5 months ago
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.β1,007Updated last year