Accessible large language models via k-bit quantization for PyTorch.
β8,168Apr 20, 2026Updated last week
Alternatives and similar repositories for bitsandbytes
Users that are interested in bitsandbytes are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast and memory-efficient exact attentionβ23,563Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β21,006Updated this week
- QLoRA: Efficient Finetuning of Quantized LLMsβ10,892Jun 10, 2024Updated last year
- Hackable and optimized Transformers building blocks, supporting a composable construction.β10,437Apr 21, 2026Updated last week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β5,053Apr 11, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,293Mar 27, 2024Updated 2 years ago
- Transformer related optimization, including BERT, GPTβ6,412Mar 27, 2024Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,512Jul 17, 2025Updated 9 months ago
- Development repository for the Triton language and compilerβ19,040Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β42,188Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ78,385Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,639Updated this week
- Large Language Model Text Generation Inferenceβ10,843Mar 21, 2026Updated last month
- Train transformer language models with reinforcement learning.β18,193Updated this week
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Ongoing research training transformer models at scaleβ16,145Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,331May 11, 2025Updated 11 months ago
- A framework for few-shot evaluation of language models.β12,331Apr 22, 2026Updated last week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizatβ¦β13,487Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.β26,397Updated this week
- Tensor library for machine learningβ14,493Apr 22, 2026Updated last week
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"β13,462Dec 17, 2024Updated last year
- 4 bits quantization of LLaMA using GPTQβ3,071Jul 13, 2024Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,641Jul 12, 2024Updated last year
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Instruct-tune LLaMA on consumer hardwareβ18,945Jul 29, 2024Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,291Updated this week
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.β39,461Jun 2, 2025Updated 10 months ago
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,363Apr 15, 2026Updated 2 weeks ago
- PyTorch native quantization and sparsity for training and inferenceβ2,796Updated this week
- Tools for merging pretrained large language models.β7,023Mar 15, 2026Updated last month
- FlashInfer: Kernel Library for LLM Servingβ5,498Updated this week
- Running large language models on a single GPU for throughput-oriented scenarios.β9,371Oct 28, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,107Jan 23, 2026Updated 3 months ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- PyTorch extensions for high performance and large scale training.β3,407Apr 26, 2025Updated last year
- Go ahead and axolotl questionsβ11,779Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.β6,198Aug 22, 2025Updated 8 months ago
- PyTorch native post-training libraryβ5,739Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,222Jul 11, 2024Updated last year
- RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)β¦β14,492Updated this week
- Code and documentation to train Stanford's Alpaca models, and generate the data.β30,264Jul 17, 2024Updated last year