huggingface / optimumLinks
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β3,130Updated this week
Alternatives and similar repositories for optimum
Users that are interested in optimum are comparing it to the libraries listed below
Sorting:
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,070Updated 3 months ago
- Accessible large language models via k-bit quantization for PyTorch.β7,687Updated last week
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,346Updated last month
- Simple, safe way to store and distribute tensorsβ3,488Updated last week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,318Updated 3 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,860Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,260Updated 5 months ago
- PyTorch native quantization and sparsity for training and inferenceβ2,464Updated this week
- Transformer related optimization, including BERT, GPTβ6,331Updated last year
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Rβ¦β2,517Updated this week
- PyTorch extensions for high performance and large scale training.β3,384Updated 6 months ago
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,164Updated last year
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,209Updated last year
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,690Updated last year
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,231Updated last week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,586Updated last year
- A pytorch quantization backend for optimumβ999Updated last week
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,876Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,688Updated last month
- Minimalistic large language model 3D-parallelism trainingβ2,274Updated last month
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,970Updated 6 months ago
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.β2,163Updated this week
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,864Updated last year
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMβ2,149Updated this week
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Headsβ2,646Updated last year
- Foundation Architecture for (M)LLMsβ3,119Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,687Updated 2 weeks ago
- Efficient few-shot learning with Sentence Transformersβ2,587Updated 2 months ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β823Updated 2 months ago
- The Triton TensorRT-LLM Backendβ903Updated this week