huggingface / optimumLinks
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β2,929Updated last week
Alternatives and similar repositories for optimum
Users that are interested in optimum are comparing it to the libraries listed below
Sorting:
- Accessible large language models via k-bit quantization for PyTorch.β7,088Updated last week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,014Updated 2 months ago
- PyTorch extensions for high performance and large scale training.β3,328Updated last month
- Transformer related optimization, including BERT, GPTβ6,179Updated last year
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,771Updated last week
- Ongoing research training transformer models at scaleβ12,468Updated this week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,041Updated 3 weeks ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,176Updated 3 weeks ago
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,228Updated 4 months ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,857Updated last month
- PyTorch native post-training libraryβ5,233Updated this week
- Simple, safe way to store and distribute tensorsβ3,282Updated this week
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,119Updated last year
- Minimalistic large language model 3D-parallelism trainingβ1,898Updated this week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,687Updated 7 months ago
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,170Updated 7 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,450Updated this week
- A framework for few-shot evaluation of language models.β9,126Updated this week
- Hackable and optimized Transformers building blocks, supporting a composable construction.β9,548Updated last week
- Large Language Model Text Generation Inferenceβ10,172Updated this week
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Headsβ2,532Updated 11 months ago
- Training and serving large-scale neural networks with auto parallelization.β3,136Updated last year
- β2,819Updated last week
- Tools for merging pretrained large language models.β5,774Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,396Updated last week
- Efficient few-shot learning with Sentence Transformersβ2,486Updated last month
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,659Updated last year
- PyTorch native quantization and sparsity for training and inferenceβ2,072Updated this week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,568Updated last year
- A pytorch quantization backend for optimumβ946Updated 2 weeks ago