huggingface / optimumLinks
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β3,005Updated last week
Alternatives and similar repositories for optimum
Users that are interested in optimum are comparing it to the libraries listed below
Sorting:
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,285Updated 3 weeks ago
- Accessible large language models via k-bit quantization for PyTorch.β7,427Updated last week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,044Updated last month
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,225Updated 2 months ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,010Updated this week
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,155Updated last year
- Simple, safe way to store and distribute tensorsβ3,380Updated this week
- PyTorch native quantization and sparsity for training and inferenceβ2,227Updated this week
- PyTorch extensions for high performance and large scale training.β3,352Updated 3 months ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,687Updated 9 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,189Updated 3 weeks ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,602Updated this week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,577Updated last year
- A pytorch quantization backend for optimumβ979Updated last month
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Rβ¦β2,465Updated this week
- Transformer related optimization, including BERT, GPTβ6,267Updated last year
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.β2,036Updated this week
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,170Updated 10 months ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,911Updated 3 months ago
- β2,858Updated 2 months ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,873Updated last year
- Minimalistic large language model 3D-parallelism trainingβ2,101Updated 3 weeks ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,845Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,524Updated last week
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β814Updated last week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,406Updated last year
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMβ1,726Updated last week
- β1,589Updated 2 years ago
- Sparsity-aware deep learning inference runtime for CPUsβ3,160Updated 2 months ago
- 4 bits quantization of LLaMA using GPTQβ3,061Updated last year