agrocylo / bitsandbytes-rocm
8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs
☆49Updated last year
Alternatives and similar repositories for bitsandbytes-rocm:
Users that are interested in bitsandbytes-rocm are comparing it to the libraries listed below
- 8-bit CUDA functions for PyTorch Rocm compatible☆39Updated last year
- 8-bit CUDA functions for PyTorch☆45Updated last month
- 4 bits quantization of LLMs using GPTQ☆48Updated last year
- DEPRECATED!☆52Updated 9 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆65Updated last year
- Fast and memory-efficient exact attention☆163Updated this week
- ☆37Updated last year
- An unsupervised model merging algorithm for Transformers-based language models.☆104Updated 11 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- An OAI compatible exllamav2 API that's both lightweight and fast☆873Updated last week
- 4 bits quantization of LLaMA using GPTQ, ported to HIP for use in AMD GPUs.☆32Updated last year
- Wheels for llama-cpp-python compiled with cuBLAS support☆96Updated last year
- ☆54Updated 9 months ago
- Web UI for ExLlamaV2☆486Updated last month
- A KoboldAI-like memory extension for oobabooga's text-generation-webui☆108Updated 5 months ago
- 4 bits quantization of LLaMa using GPTQ☆130Updated last year
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated 10 months ago
- GPU Power and Performance Manager☆57Updated 5 months ago
- A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion.☆310Updated last year
- Development repository for the Triton language and compiler☆114Updated this week
- An extension for oobabooga's text-generation-webui that adds syntax highlighting to code snippets☆66Updated 9 months ago
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆200Updated last month
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆44Updated last month
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆159Updated last year
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆233Updated 10 months ago
- Text WebUI extension to add clever Notebooks to Chat mode☆139Updated last year
- A discord bot with many features which uses A1111 as backend and uses my prompt templates for beautiful generations - even with short pro…☆43Updated last year
- A prompt/context management system☆169Updated last year
- ☆156Updated last year