π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β3,332Mar 13, 2026Updated last week
Alternatives and similar repositories for optimum
Users that are interested in optimum are comparing it to the libraries listed below
Sorting:
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,563Updated this week
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,429Mar 10, 2026Updated last week
- Large Language Model Text Generation Inferenceβ10,812Jan 8, 2026Updated 2 months ago
- Accessible large language models via k-bit quantization for PyTorch.β8,052Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,809Updated this week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,687Oct 23, 2024Updated last year
- Transformer related optimization, including BERT, GPTβ6,397Mar 27, 2024Updated last year
- Simple, safe way to store and distribute tensorsβ3,660Mar 12, 2026Updated last week
- Fast and memory-efficient exact attentionβ22,832Updated this week
- π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation toolsβ21,289Updated this week
- π€ Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.β33,085Updated this week
- π₯ Fast State-of-the-Art Tokenizers optimized for Research and Productionβ10,529Feb 28, 2026Updated 3 weeks ago
- Train transformer language models with reinforcement learning.β17,697Updated this week
- Hackable and optimized Transformers building blocks, supporting a composable construction.β10,373Updated this week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,101Jun 30, 2025Updated 8 months ago
- Efficient few-shot learning with Sentence Transformersβ2,699Dec 11, 2025Updated 3 months ago
- A pytorch quantization backend for optimumβ1,032Nov 21, 2025Updated 4 months ago
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ549Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β41,869Updated this week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizatβ¦β13,120Updated this week
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.β10,446Updated this week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β5,034Apr 11, 2025Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ73,479Updated this week
- A blazing fast inference solution for text embeddings modelsβ4,600Mar 13, 2026Updated last week
- SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, β¦β2,598Updated this week
- ONNX Runtime: cross-platform, high performance ML inferencing and training acceleratorβ19,568Updated this week
- Development repository for the Triton language and compilerβ18,708Updated this week
- PyTorch extensions for high performance and large scale training.β3,403Apr 26, 2025Updated 10 months ago
- PyTorch native quantization and sparsity for training and inferenceβ2,739Updated this week
- State-of-the-Art Text Embeddingsβ18,427Mar 12, 2026Updated last week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,585Jan 28, 2026Updated last month
- Foundation Architecture for (M)LLMsβ3,135Apr 11, 2024Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,956Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,231Updated this week
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β16,918Updated this week
- Fast inference engine for Transformer modelsβ4,368Feb 4, 2026Updated last month
- SGLang is a high-performance serving framework for large language models and multimodal models.β24,829Updated this week
- Minimalistic large language model 3D-parallelism trainingβ2,617Feb 19, 2026Updated last month
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β331Sep 25, 2025Updated 5 months ago