agrocylo / bitsandbytes-rocm
8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs
☆48Updated last year
Alternatives and similar repositories for bitsandbytes-rocm:
Users that are interested in bitsandbytes-rocm are comparing it to the libraries listed below
- 8-bit CUDA functions for PyTorch Rocm compatible☆39Updated 10 months ago
- 8-bit CUDA functions for PyTorch☆42Updated this week
- 4 bits quantization of LLaMA using GPTQ, ported to HIP for use in AMD GPUs.☆32Updated last year
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆35Updated 5 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆105Updated 9 months ago
- ☆37Updated last year
- Merge Transformers language models by use of gradient parameters.☆205Updated 6 months ago
- DEPRECATED!☆53Updated 8 months ago
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆190Updated 2 weeks ago
- Fast and memory-efficient exact attention☆157Updated this week
- Wheels for llama-cpp-python compiled with cuBLAS support☆94Updated last year
- Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts☆111Updated last year
- Make abliterated models with transformers, easy and fast☆52Updated 3 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆155Updated this week
- RAG implementation for Ooba characters. dynamically spins up new qdrant vector DB and manages retrieval and commits for conversations ba…☆46Updated last year
- Efficient 3bit/4bit quantization of LLaMA models☆19Updated last year
- 4 bits quantization of LLMs using GPTQ☆47Updated last year
- C/C++ implementation of PygmalionAI/pygmalion-6b☆55Updated last year
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆65Updated last year
- automatically quant GGUF models☆154Updated this week
- Text WebUI extension to add clever Notebooks to Chat mode☆139Updated last year
- A KoboldAI-like memory extension for oobabooga's text-generation-webui☆108Updated 3 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- A simple converter which converts pytorch bin files to safetensor, intended to be used for LLM conversion.☆58Updated last year
- Model REVOLVER, a human in the loop model mixing system.☆33Updated last year
- Simple monkeypatch to boost AMD Navi 3 GPUs☆33Updated 9 months ago
- GPU Power and Performance Manager☆55Updated 4 months ago
- A finetuning pipeline for instruct tuning Raven 14bn using QLORA 4bit and the Ditty finetuning library☆28Updated 8 months ago