huggingface / optimum
π Accelerate training and inference of π€ Transformers and π€ Diffusers with easy to use hardware optimization tools
β2,459Updated this week
Related projects: β
- Accessible large language models via k-bit quantization for PyTorch.β6,029Updated this week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β1,843Updated last week
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β1,965Updated this week
- PyTorch extensions for high performance and large scale training.β3,149Updated 2 weeks ago
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,519Updated 7 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ2,333Updated 2 months ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,643Updated 10 months ago
- β2,635Updated last week
- Simple, safe way to store and distribute tensorsβ2,755Updated 2 weeks ago
- A framework for few-shot evaluation of language models.β6,426Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β7,687Updated this week
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,442Updated 8 months ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β1,875Updated 5 months ago
- LLM training code for Databricks foundation modelsβ3,964Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUsβ¦β1,811Updated this week
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,792Updated 7 months ago
- General technology for enabling AI capabilities w/ LLMs and MLLMsβ3,561Updated this week
- Transformer related optimization, including BERT, GPTβ5,773Updated 5 months ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,326Updated last month
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β1,624Updated this week
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,210Updated 3 weeks ago
- Fast inference engine for Transformer modelsβ3,218Updated this week
- Tools for merging pretrained large language models.β4,501Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMsβ2,080Updated this week
- Train transformer language models with reinforcement learning.β9,288Updated this week
- SGLang is a fast serving framework for large language models and vision language models.β5,121Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,314Updated 5 months ago
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,106Updated 3 weeks ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersβ1,698Updated 7 months ago
- Efficient few-shot learning with Sentence Transformersβ2,138Updated last week