Accessible large language models via k-bit quantization for PyTorch.
β8,197May 8, 2026Updated last week
Alternatives and similar repositories for bitsandbytes
Users that are interested in bitsandbytes are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast and memory-efficient exact attentionβ23,836Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β21,138May 13, 2026Updated last week
- QLoRA: Efficient Finetuning of Quantized LLMsβ10,908Jun 10, 2024Updated last year
- Hackable and optimized Transformers building blocks, supporting a composable construction.β10,462Apr 21, 2026Updated 3 weeks ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β5,060Apr 11, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,305Mar 27, 2024Updated 2 years ago
- Transformer related optimization, including BERT, GPTβ6,416Mar 27, 2024Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,536Jul 17, 2025Updated 10 months ago
- Development repository for the Triton language and compilerβ19,184Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β42,337Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,678May 7, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ80,418Updated this week
- Large Language Model Text Generation Inferenceβ10,853Mar 21, 2026Updated last month
- Train transformer language models with reinforcement learning.β18,411Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Ongoing research training transformer models at scaleβ16,340Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,336May 11, 2025Updated last year
- A framework for few-shot evaluation of language models.β12,595May 11, 2026Updated last week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizatβ¦β13,669Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.β27,836Updated this week
- Tensor library for machine learningβ14,645Updated this week
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"β13,517Dec 17, 2024Updated last year
- 4 bits quantization of LLaMA using GPTQβ3,072Jul 13, 2024Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,647Jul 12, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Instruct-tune LLaMA on consumer hardwareβ18,931Jul 29, 2024Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,340Updated this week
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.β39,474May 1, 2026Updated 2 weeks ago
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,392May 7, 2026Updated last week
- PyTorch native quantization and sparsity for training and inferenceβ2,825Updated this week
- Tools for merging pretrained large language models.β7,083May 6, 2026Updated 2 weeks ago
- FlashInfer: Kernel Library for LLM Servingβ5,621Updated this week
- Running large language models on a single GPU for throughput-oriented scenarios.β9,368Oct 28, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,128Jan 23, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- PyTorch extensions for high performance and large scale training.β3,406Apr 26, 2025Updated last year
- Go ahead and axolotl questionsβ11,938Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.β6,205Aug 22, 2025Updated 8 months ago
- PyTorch native post-training libraryβ5,754Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,229Jul 11, 2024Updated last year
- RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)β¦β14,531May 8, 2026Updated last week
- Code and documentation to train Stanford's Alpaca models, and generate the data.β30,253Jul 17, 2024Updated last year