huggingface / optimumLinks
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β3,052Updated this week
Alternatives and similar repositories for optimum
Users that are interested in optimum are comparing it to the libraries listed below
Sorting:
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,048Updated last month
- Accessible large language models via k-bit quantization for PyTorch.β7,511Updated this week
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,304Updated 2 weeks ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,688Updated 10 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,229Updated 3 months ago
- Transformer related optimization, including BERT, GPTβ6,280Updated last year
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,581Updated last year
- Simple, safe way to store and distribute tensorsβ3,414Updated last week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,220Updated last month
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,171Updated last year
- PyTorch extensions for high performance and large scale training.β3,364Updated 4 months ago
- A pytorch quantization backend for optimumβ984Updated this week
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,876Updated last year
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,069Updated this week
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Rβ¦β2,482Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,666Updated last week
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,168Updated 10 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,852Updated last year
- PyTorch native quantization and sparsity for training and inferenceβ2,291Updated this week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,928Updated 4 months ago
- Minimalistic large language model 3D-parallelism trainingβ2,164Updated this week
- Fast inference engine for Transformer modelsβ3,987Updated 4 months ago
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Headsβ2,604Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,412Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β2,150Updated 2 weeks ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMβ1,861Updated this week
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,601Updated 2 months ago
- Foundation Architecture for (M)LLMsβ3,104Updated last year
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliβ¦β3,559Updated this week
- Sparsity-aware deep learning inference runtime for CPUsβ3,157Updated 2 months ago