agrocylo / bitsandbytes-rocmLinks

8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs

☆51

Alternatives and similar repositories for bitsandbytes-rocm

Users that are interested in bitsandbytes-rocm are comparing it to the libraries listed below

Sorting:

arlo-phoenix / bitsandbytes-rocm-5.6
8-bit CUDA functions for PyTorch Rocm compatible
☆41Updated last year
ROCm / bitsandbytes
8-bit CUDA functions for PyTorch
☆66Updated last month
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆247Updated last year
jllllll / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆63Updated 2 years ago
0cc4m / GPTQ-for-LLaMa
4 bits quantization of LLMs using GPTQ
☆49Updated 2 years ago
nktice / AMD-AI
AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1
☆212Updated this week
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆106Updated last year
broncotc / bitsandbytes-rocm
☆37Updated 2 years ago
harrisonvanderbyl / rwkv-cpp-accelerated
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…
☆313Updated last year
jllllll / llama-cpp-python-cuBLAS-wheels
Wheels for llama-cpp-python compiled with cuBLAS support
☆97Updated last year
xor2k / gpu_undervolt
☆42Updated 2 years ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆90Updated this week
pointnetwork / point-alpaca
☆403Updated 2 years ago
aigoopy / llm-jeopardy
Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
☆108Updated 2 years ago
tsl0922 / pytorch-gfx803
Run stable-diffusion-webui with Radeon RX 580 8GB on Ubuntu 22.04.2 LTS
☆67Updated last year
wawawario2 / long_term_memory
A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion.
☆308Updated 2 years ago
aikitoria / open-gpu-kernel-modules
NVIDIA Linux open GPU with P2P support
☆66Updated 2 weeks ago
FailSpy / abliterator
Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens
☆518Updated last year
johnsmith0031 / alpaca_lora_4bit
☆534Updated last year
Gryphe / BlockMerge_Gradient
Merge Transformers language models by use of gradient parameters.
☆207Updated last year
xuhuisheng / rocm-gfx803
☆234Updated 2 years ago
turboderp-org / exui
Web UI for ExLlamaV2
☆511Updated 8 months ago
iantbutler01 / rwkv-raven-qlora-4bit-instruct
A finetuning pipeline for instruct tuning Raven 14bn using QLORA 4bit and the Ditty finetuning library
☆28Updated last year
0cc4m / KoboldAI
☆157Updated 2 years ago
randaller / llama-cpu
Inference on CPU code for LLaMA models
☆137Updated 2 years ago
ROCm / flash-attention
Fast and memory-efficient exact attention
☆194Updated last week
Orion-zhen / abliteration
Make abliterated models with transformers, easy and fast
☆90Updated 6 months ago
aspctu / alpaca-lora
Instruct-tuning LLaMA on consumer hardware
☆65Updated 2 years ago
eugenepentland / landmark-attention-qlora
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
☆123Updated 2 years ago
theroyallab / tabbyAPI
The official API server for Exllama. OAI compatible, lightweight, and fast.
☆1,068Updated last week