Accessible large language models via k-bit quantization for PyTorch.
β8,286Jun 22, 2026Updated this week
Alternatives and similar repositories for bitsandbytes
Users that are interested in bitsandbytes are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast and memory-efficient exact attentionβ24,221Updated this week
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β21,299Jun 22, 2026Updated last week
- QLoRA: Efficient Finetuning of Quantized LLMsβ10,940Jun 10, 2024Updated 2 years ago
- Hackable and optimized Transformers building blocks, supporting a composable construction.β10,509Jun 18, 2026Updated last week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β5,072Apr 11, 2025Updated last year
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,327Mar 27, 2024Updated 2 years ago
- Transformer related optimization, including BERT, GPTβ6,428Mar 27, 2024Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,578Jul 17, 2025Updated 11 months ago
- Development repository for the Triton language and compilerβ19,525Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β42,586Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,737Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ83,677Updated this week
- Large Language Model Text Generation Inferenceβ10,862Mar 21, 2026Updated 3 months ago
- Train transformer language models with reinforcement learning.β18,701Updated this week
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Ongoing research training transformer models at scaleβ16,838Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,349May 11, 2025Updated last year
- A framework for few-shot evaluation of language models.β13,024Jun 2, 2026Updated 3 weeks ago
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizatβ¦β13,941Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.β29,694Updated this week
- Tensor library for machine learningβ14,871Jun 19, 2026Updated last week
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"β13,614Dec 17, 2024Updated last year
- 4 bits quantization of LLaMA using GPTQβ3,073Jul 13, 2024Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,662Jul 12, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Instruct-tune LLaMA on consumer hardwareβ18,913Jul 29, 2024Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,408Updated this week
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.β39,486May 1, 2026Updated last month
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,426Jun 22, 2026Updated last week
- PyTorch native quantization and sparsity for training and inferenceβ2,875Updated this week
- Tools for merging pretrained large language models.β7,173Jun 17, 2026Updated last week
- FlashInfer: Kernel Library for LLM Servingβ5,867Updated this week
- Running large language models on a single GPU for throughput-oriented scenarios.β9,362Oct 28, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,151Jan 23, 2026Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- PyTorch extensions for high performance and large scale training.β3,409Apr 26, 2025Updated last year
- Go ahead and axolotl questionsβ12,082Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.β6,224Aug 22, 2025Updated 10 months ago
- PyTorch native post-training libraryβ5,777Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,233Jul 11, 2024Updated last year
- RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)β¦β14,572Jun 13, 2026Updated 2 weeks ago
- Code and documentation to train Stanford's Alpaca models, and generate the data.β30,249Jul 17, 2024Updated last year