huggingface / optimum
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
β2,825Updated 3 weeks ago
Alternatives and similar repositories for optimum:
Users that are interested in optimum are comparing it to the libraries listed below
- Accessible large language models via k-bit quantization for PyTorch.β6,868Updated last week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β1,998Updated last week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,543Updated last week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ2,895Updated last week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,560Updated last year
- Transformer related optimization, including BERT, GPTβ6,100Updated last year
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,168Updated 2 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,048Updated 3 weeks ago
- Fast inference engine for Transformer modelsβ3,708Updated this week
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Rβ¦β2,365Updated this week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,683Updated 5 months ago
- PyTorch extensions for high performance and large scale training.β3,285Updated 2 months ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,067Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUsβ¦β2,311Updated last week
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,166Updated 5 months ago
- PyTorch native quantization and sparsity for training and inferenceβ1,927Updated this week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,781Updated 2 weeks ago
- Minimalistic large language model 3D-parallelism trainingβ1,737Updated this week
- Efficient few-shot learning with Sentence Transformersβ2,424Updated 2 months ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,614Updated last year
- Simple, safe way to store and distribute tensorsβ3,197Updated 2 weeks ago
- β2,776Updated last week
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,855Updated last year
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliβ¦β3,064Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β2,034Updated last week
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,810Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,380Updated last year
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,432Updated 3 weeks ago
- Foundation Architecture for (M)LLMsβ3,067Updated 11 months ago
- Tools for merging pretrained large language models.β5,498Updated this week