agrocylo / bitsandbytes-rocmLinks
8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs
☆50Updated 2 years ago
Alternatives and similar repositories for bitsandbytes-rocm
Users that are interested in bitsandbytes-rocm are comparing it to the libraries listed below
Sorting:
- 8-bit CUDA functions for PyTorch Rocm compatible☆41Updated last year
- 8-bit CUDA functions for PyTorch☆53Updated 3 weeks ago
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…☆312Updated last year
- ☆37Updated 2 years ago
- Fast and memory-efficient exact attention☆177Updated this week
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆101Updated 2 months ago
- Wheels for llama-cpp-python compiled with cuBLAS support☆97Updated last year
- Web UI for ExLlamaV2☆503Updated 5 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆105Updated last year
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆209Updated 4 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- ☆54Updated last year
- DEPRECATED!☆52Updated last year
- 4 bits quantization of LLMs using GPTQ☆49Updated last year
- Make abliterated models with transformers, easy and fast☆79Updated 3 months ago
- A simple converter which converts pytorch bin files to safetensor, intended to be used for LLM conversion.☆69Updated last year
- Efficient 3bit/4bit quantization of LLaMA models☆19Updated 2 years ago
- automatically quant GGUF models☆187Updated this week
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆156Updated last year
- AMD related optimizations for transformer models☆80Updated 3 weeks ago
- ChatGPT-like Web UI for RWKVstic☆100Updated 2 years ago
- Make PyTorch models at least run on APUs.☆54Updated last year
- Run stable-diffusion-webui with Radeon RX 580 8GB on Ubuntu 22.04.2 LTS☆64Updated last year
- Inference on CPU code for LLaMA models☆137Updated 2 years ago
- ☆535Updated last year
- 4 bits quantization of LLaMA using GPTQ, ported to HIP for use in AMD GPUs.☆32Updated last year
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆436Updated this week
- Merge Transformers language models by use of gradient parameters.☆206Updated 11 months ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆43Updated 10 months ago