Accessible large language models via k-bit quantization for PyTorch.
β8,092Mar 31, 2026Updated last week
Alternatives and similar repositories for bitsandbytes
Users that are interested in bitsandbytes are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast and memory-efficient exact attentionβ23,185Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,895Apr 2, 2026Updated last week
- QLoRA: Efficient Finetuning of Quantized LLMsβ10,865Jun 10, 2024Updated last year
- Hackable and optimized Transformers building blocks, supporting a composable construction.β10,411Mar 30, 2026Updated last week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β5,042Apr 11, 2025Updated 11 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,282Mar 27, 2024Updated 2 years ago
- Transformer related optimization, including BERT, GPTβ6,410Mar 27, 2024Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,488Jul 17, 2025Updated 8 months ago
- Development repository for the Triton language and compilerβ18,840Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β41,977Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ75,637Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,596Apr 2, 2026Updated last week
- Large Language Model Text Generation Inferenceβ10,817Mar 21, 2026Updated 2 weeks ago
- Train transformer language models with reinforcement learning.β17,967Updated this week
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Ongoing research training transformer models at scaleβ15,900Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,322May 11, 2025Updated 10 months ago
- A framework for few-shot evaluation of language models.β12,020Apr 1, 2026Updated last week
- SGLang is a high-performance serving framework for large language models and multimodal models.β25,408Updated this week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizatβ¦β13,304Updated this week
- Tensor library for machine learningβ14,340Apr 2, 2026Updated last week
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"β13,383Dec 17, 2024Updated last year
- 4 bits quantization of LLaMA using GPTQβ3,071Jul 13, 2024Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,631Jul 12, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Instruct-tune LLaMA on consumer hardwareβ18,954Jul 29, 2024Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,256Updated this week
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.β39,447Jun 2, 2025Updated 10 months ago
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,348Apr 2, 2026Updated last week
- PyTorch native quantization and sparsity for training and inferenceβ2,756Updated this week
- FlashInfer: Kernel Library for LLM Servingβ5,273Updated this week
- Tools for merging pretrained large language models.β6,945Mar 15, 2026Updated 3 weeks ago
- Running large language models on a single GPU for throughput-oriented scenarios.β9,376Oct 28, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,077Jan 23, 2026Updated 2 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- PyTorch extensions for high performance and large scale training.β3,404Apr 26, 2025Updated 11 months ago
- Go ahead and axolotl questionsβ11,608Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.β6,190Aug 22, 2025Updated 7 months ago
- PyTorch native post-training libraryβ5,720Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,208Jul 11, 2024Updated last year
- RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)β¦β14,458Mar 30, 2026Updated last week
- Code and documentation to train Stanford's Alpaca models, and generate the data.β30,264Jul 17, 2024Updated last year