ROCm / bitsandbytesLinks

8-bit CUDA functions for PyTorch

☆68

Alternatives and similar repositories for bitsandbytes

Users that are interested in bitsandbytes are comparing it to the libraries listed below

Sorting:

ROCm / flash-attention
Fast and memory-efficient exact attention
☆200Updated last month
ROCm / triton
Development repository for the Triton language and compiler
☆137Updated this week
ROCm / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆108Updated this week
ROCm / TheRock
The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm
☆563Updated this week
ROCm / hipBLASLt
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆114Updated this week
ROCm / aiter
AI Tensor Engine for ROCm
☆301Updated this week
ROCm / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆12Updated last year
ROCm / rocm_bandwidth_test
Bandwidth test for ROCm
☆69Updated last week
Repeerc / flash-attention-v2-RDNA3-minimal
a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…
☆48Updated last year
huggingface / optimum-amd
AMD related optimizations for transformer models
☆95Updated last month
ROCm / amdsmi
AMD SMI
☆98Updated this week
ROCm / rocmProfileData
☆27Updated last month
olealgoritme / gddr6
Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.
☆104Updated 6 months ago
xuhuisheng / rocm-build
build scripts for ROCm
☆188Updated last year
ROCm / AMDMIGraphX
AMD's graph optimization engine.
☆266Updated this week
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆83Updated last week
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆219Updated this week
ROCm / jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
☆24Updated this week
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆210Updated this week
ROCm / rocm-examples
A collection of examples for the ROCm software stack
☆253Updated this week
arlo-phoenix / bitsandbytes-rocm-5.6
8-bit CUDA functions for PyTorch Rocm compatible
☆41Updated last year
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆63Updated 4 months ago
ROCm / ROCmValidationSuite
A system validation and diagnostics tool for monitoring, stress testing, detecting, and troubleshooting issues impacting AMD GPUs in high…
☆88Updated last week
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆272Updated 4 months ago
ROCm / hipBLAS
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆147Updated this week
ROCm / apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
☆23Updated last week
artyom-beilis / pytorch_dlprim
DLPrimitives/OpenCL out of tree backend for pytorch
☆377Updated last year
amd / ZenDNN
☆126Updated last week
ROCm / rocBLAS
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆388Updated this week
ROCm / TransformerEngine
☆51Updated this week