agrocylo / bitsandbytes-rocmLinks
8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs
☆51Updated 2 years ago
Alternatives and similar repositories for bitsandbytes-rocm
Users that are interested in bitsandbytes-rocm are comparing it to the libraries listed below
Sorting:
- 8-bit CUDA functions for PyTorch Rocm compatible☆41Updated last year
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆209Updated 5 months ago
- ☆37Updated 2 years ago
- 4 bits quantization of LLMs using GPTQ☆49Updated 2 years ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- ☆534Updated last year
- ☆158Updated last year
- 8-bit CUDA functions for PyTorch☆54Updated last month
- 4 bits quantization of LLaMa using GPTQ☆130Updated 2 years ago
- KoboldAI is generative AI software optimized for fictional use, but capable of much more!☆414Updated 6 months ago
- Web UI for ExLlamaV2☆505Updated 6 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆106Updated last year
- ☆42Updated 2 years ago
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…☆312Updated last year
- 4 bits quantization of LLaMA using GPTQ, ported to HIP for use in AMD GPUs.☆32Updated last year
- Wheels for llama-cpp-python compiled with cuBLAS support☆98Updated last year
- Fast and memory-efficient exact attention☆180Updated this week
- Inference on CPU code for LLaMA models☆137Updated 2 years ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts☆110Updated 2 years ago
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆128Updated 2 years ago
- ☆405Updated 2 years ago
- Efficient 3bit/4bit quantization of LLaMA models☆19Updated 2 years ago
- A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion.☆307Updated last year
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,020Updated this week
- Merge Transformers language models by use of gradient parameters.☆206Updated last year
- A finetuning pipeline for instruct tuning Raven 14bn using QLORA 4bit and the Ditty finetuning library☆29Updated last year
- Extend the original llama.cpp repo to support redpajama model.☆118Updated 11 months ago
- ☆549Updated 9 months ago