Accessible large language models via k-bit quantization for PyTorch.
β8,258Jun 5, 2026Updated this week
Alternatives and similar repositories for bitsandbytes
Users that are interested in bitsandbytes are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast and memory-efficient exact attentionβ24,037Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β21,226Jun 1, 2026Updated last week
- QLoRA: Efficient Finetuning of Quantized LLMsβ10,925Jun 10, 2024Updated last year
- Hackable and optimized Transformers building blocks, supporting a composable construction.β10,484May 21, 2026Updated 2 weeks ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β5,068Apr 11, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,318Mar 27, 2024Updated 2 years ago
- Transformer related optimization, including BERT, GPTβ6,419Mar 27, 2024Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,556Jul 17, 2025Updated 10 months ago
- Development repository for the Triton language and compilerβ19,380Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β42,478Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,711Jun 2, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ81,909Updated this week
- Large Language Model Text Generation Inferenceβ10,859Mar 21, 2026Updated 2 months ago
- Train transformer language models with reinforcement learning.β18,547Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Ongoing research training transformer models at scaleβ16,617Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,341May 11, 2025Updated last year
- A framework for few-shot evaluation of language models.β12,783May 11, 2026Updated 3 weeks ago
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizatβ¦β13,793Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.β28,886Updated this week
- Tensor library for machine learningβ14,770May 29, 2026Updated last week
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"β13,579Dec 17, 2024Updated last year
- 4 bits quantization of LLaMA using GPTQβ3,073Jul 13, 2024Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,658Jul 12, 2024Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Instruct-tune LLaMA on consumer hardwareβ18,920Jul 29, 2024Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,381Updated this week
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.β39,479May 1, 2026Updated last month
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,409Updated this week
- PyTorch native quantization and sparsity for training and inferenceβ2,847Updated this week
- Tools for merging pretrained large language models.β7,108May 6, 2026Updated last month
- FlashInfer: Kernel Library for LLM Servingβ5,760Updated this week
- Running large language models on a single GPU for throughput-oriented scenarios.β9,365Oct 28, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,139Jan 23, 2026Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- PyTorch extensions for high performance and large scale training.β3,407Apr 26, 2025Updated last year
- Go ahead and axolotl questionsβ12,001Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.β6,216Aug 22, 2025Updated 9 months ago
- PyTorch native post-training libraryβ5,768Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,230Jul 11, 2024Updated last year
- RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)β¦β14,550Jun 2, 2026Updated last week
- Code and documentation to train Stanford's Alpaca models, and generate the data.β30,248Jul 17, 2024Updated last year