WapaMario63 / GPTQ-for-LLaMa-ROCmLinks
4 bits quantization of LLaMA using GPTQ, ported to HIP for use in AMD GPUs.
☆32Updated last year
Alternatives and similar repositories for GPTQ-for-LLaMa-ROCm
Users that are interested in GPTQ-for-LLaMa-ROCm are comparing it to the libraries listed below
Sorting:
- DEPRECATED!☆52Updated last year
- Web UI for ExLlamaV2☆503Updated 5 months ago
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆50Updated 2 years ago
- ☆37Updated 2 years ago
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆209Updated 4 months ago
- A fork of vLLM enabling Pascal architecture GPUs☆28Updated 4 months ago
- A manual for helping using tesla p40 gpu☆126Updated 8 months ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- A fork of textgen that kept some things like Exllama and old GPTQ.☆22Updated 10 months ago
- A prompt/context management system☆170Updated 2 years ago
- ☆80Updated this week
- A fast batching API to serve LLM models☆183Updated last year
- A free AI text generation interface based on KoboldAI☆34Updated last year
- ☆535Updated last year
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆333Updated 3 weeks ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆156Updated last year
- A community list of common phrases generated by GPT and Claude models☆77Updated last year
- Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts☆110Updated last year
- A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion.☆309Updated last year
- Stable Diffusion and Flux in pure C/C++☆20Updated this week
- My personal fork of koboldcpp where I hack in experimental samplers.☆46Updated last year
- 8-bit CUDA functions for PyTorch Rocm compatible☆41Updated last year
- ☆155Updated 5 months ago
- ☆158Updated last year
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated 2 years ago
- Dolphin System Messages☆320Updated 5 months ago
- Docker configuration for koboldcpp☆34Updated last year
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,000Updated this week
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆89Updated last month
- LLaMa retrieval plugin script using OpenAI's retrieval plugin☆324Updated 2 years ago