ROCm / bitsandbytesLinks
8-bit CUDA functions for PyTorch
☆53Updated 3 weeks ago
Alternatives and similar repositories for bitsandbytes
Users that are interested in bitsandbytes are comparing it to the libraries listed below
Sorting:
- Fast and memory-efficient exact attention☆177Updated this week
- Development repository for the Triton language and compiler☆125Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆85Updated this week
- AI Tensor Engine for ROCm☆232Updated this week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆12Updated last year
- DLPrimitives/OpenCL out of tree backend for pytorch☆356Updated 10 months ago
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆222Updated this week
- Hackable and optimized Transformers building blocks, supporting a composable construction.☆31Updated this week
- OpenAI Triton backend for Intel® GPUs☆191Updated this week
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆43Updated 10 months ago
- AMD related optimizations for transformer models☆80Updated 3 weeks ago
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more☆24Updated this week
- AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs.☆555Updated last week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆651Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆382Updated this week
- ☆111Updated last week
- Ahead of Time (AOT) Triton Math Library☆70Updated this week
- Fast low-bit matmul kernels in Triton☆330Updated last week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆255Updated 8 months ago
- AMD SMI☆78Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆437Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆110Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆135Updated this week
- ☆139Updated 3 weeks ago
- Ongoing research training transformer models at scale☆24Updated last week
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆22Updated this week
- oneCCL Bindings for Pytorch*☆99Updated this week
- Deep Learning Primitives and Mini-Framework for OpenCL☆199Updated 10 months ago
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆101Updated 2 months ago
- AMD's graph optimization engine.☆228Updated this week