huggingface / optimumLinks
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β3,110Updated this week
Alternatives and similar repositories for optimum
Users that are interested in optimum are comparing it to the libraries listed below
Sorting:
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,063Updated 3 months ago
- Accessible large language models via k-bit quantization for PyTorch.β7,627Updated this week
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,335Updated last week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,689Updated 11 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,253Updated 4 months ago
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Rβ¦β2,504Updated last week
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,192Updated last year
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,180Updated last week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,582Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,763Updated this week
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,165Updated last year
- Transformer related optimization, including BERT, GPTβ6,321Updated last year
- Simple, safe way to store and distribute tensorsβ3,470Updated last week
- PyTorch native quantization and sparsity for training and inferenceβ2,392Updated this week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,289Updated 2 months ago
- Minimalistic large language model 3D-parallelism trainingβ2,246Updated last month
- PyTorch extensions for high performance and large scale training.β3,376Updated 5 months ago
- A pytorch quantization backend for optimumβ989Updated last month
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,878Updated last year
- Large Language Model Text Generation Inferenceβ10,550Updated 3 weeks ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,951Updated 5 months ago
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.β2,124Updated this week
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,854Updated last year
- PyTorch native post-training libraryβ5,523Updated this week
- Fast inference engine for Transformer modelsβ4,047Updated 6 months ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N β¦β4,679Updated 3 weeks ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMβ2,042Updated this week
- β2,889Updated last week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,420Updated last year
- Sparsity-aware deep learning inference runtime for CPUsβ3,154Updated 4 months ago