huggingface / optimumLinks

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

☆3,130

Alternatives and similar repositories for optimum

Users that are interested in optimum are comparing it to the libraries listed below

Sorting:

deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,070Updated 3 months ago
bitsandbytes-foundation / bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
☆7,687Updated last week
huggingface / evaluate
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
☆2,346Updated last month
huggingface / safetensors
Simple, safe way to store and distribute tensors
☆3,488Updated last week
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,318Updated 3 months ago
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,860Updated this week
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,260Updated 5 months ago
pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆2,464Updated this week
NVIDIA / FasterTransformer
Transformer related optimization, including BERT, GPT
☆6,331Updated last year
intel / neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…
☆2,517Updated this week
facebookresearch / fairscale
PyTorch extensions for high performance and large scale training.
☆3,384Updated 6 months ago
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,164Updated last year
IST-DASLab / gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,209Updated last year
ELS-RD / transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
☆1,690Updated last year
huggingface / accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (i…
☆9,231Updated last week
ELS-RD / kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…
☆1,586Updated last year
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆999Updated last week
marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,876Updated last year
facebookincubator / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆4,688Updated last month
huggingface / nanotron
Minimalistic large language model 3D-parallelism training
☆2,274Updated last month
AutoGPTQ / AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆4,970Updated 6 months ago
microsoft / Olive
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
☆2,163Updated this week
S-LoRA / S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,864Updated last year
vllm-project / llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆2,149Updated this week
FasterDecoding / Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,646Updated last year
microsoft / torchscale
Foundation Architecture for (M)LLMs
☆3,119Updated last year
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,687Updated 2 weeks ago
huggingface / setfit
Efficient few-shot learning with Sentence Transformers
☆2,587Updated 2 months ago
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆823Updated 2 months ago
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆903Updated this week