huggingface / optimumLinks
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β3,085Updated this week
Alternatives and similar repositories for optimum
Users that are interested in optimum are comparing it to the libraries listed below
Sorting:
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,053Updated 2 months ago
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,320Updated last month
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,729Updated this week
- Simple, safe way to store and distribute tensorsβ3,446Updated last week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,582Updated last year
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,689Updated 10 months ago
- Accessible large language models via k-bit quantization for PyTorch.β7,584Updated this week
- PyTorch extensions for high performance and large scale training.β3,369Updated 4 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,247Updated 4 months ago
- Transformer related optimization, including BERT, GPTβ6,300Updated last year
- PyTorch native quantization and sparsity for training and inferenceβ2,361Updated this week
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,180Updated last year
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,246Updated 2 months ago
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Rβ¦β2,492Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,133Updated this week
- A pytorch quantization backend for optimumβ987Updated 3 weeks ago
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,170Updated 11 months ago
- Minimalistic large language model 3D-parallelism trainingβ2,212Updated 2 weeks ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,877Updated last year
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,853Updated last year
- β2,882Updated 2 weeks ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,942Updated 5 months ago
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.β1,007Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,675Updated 3 weeks ago
- Sparsity-aware deep learning inference runtime for CPUsβ3,158Updated 3 months ago
- Fast inference engine for Transformer modelsβ4,021Updated 5 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,417Updated last year
- PyTorch native post-training libraryβ5,493Updated this week
- Training and serving large-scale neural networks with auto parallelization.β3,153Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,616Updated 3 weeks ago