huggingface / optimum
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β2,892Updated this week
Alternatives and similar repositories for optimum
Users that are interested in optimum are comparing it to the libraries listed below
Sorting:
- Accessible large language models via k-bit quantization for PyTorch.β7,020Updated this week
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,207Updated 4 months ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,708Updated this week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,009Updated last month
- Simple, safe way to store and distribute tensorsβ3,257Updated last week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ2,991Updated this week
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,104Updated last year
- PyTorch extensions for high performance and large scale training.β3,313Updated 2 weeks ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,682Updated 6 months ago
- Transformer related optimization, including BERT, GPTβ6,152Updated last year
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,838Updated last month
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Rβ¦β2,402Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,400Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,155Updated this week
- PyTorch native post-training libraryβ5,171Updated this week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nβ¦β4,633Updated last month
- Minimalistic large language model 3D-parallelism trainingβ1,850Updated this week
- A framework for few-shot evaluation of language models.β8,904Updated this week
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Headsβ2,522Updated 10 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flβ¦β2,476Updated 9 months ago
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,476Updated last week
- β2,807Updated 2 weeks ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,641Updated last year
- The RedPajama-Data repository contains code for preparing large datasets for training large language models.β4,713Updated 5 months ago
- A pytorch quantization backend for optimumβ935Updated 3 weeks ago
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,171Updated 7 months ago
- Large Language Model Text Generation Inferenceβ10,101Updated this week
- PyTorch native quantization and sparsity for training and inferenceβ2,030Updated this week
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models β¦β2,214Updated this week
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,822Updated last year