WapaMario63 / GPTQ-for-LLaMa-ROCm
4 bits quantization of LLaMA using GPTQ, ported to HIP for use in AMD GPUs.
☆32Updated last year
Alternatives and similar repositories for GPTQ-for-LLaMa-ROCm:
Users that are interested in GPTQ-for-LLaMa-ROCm are comparing it to the libraries listed below
- DEPRECATED!☆52Updated 9 months ago
- 8-bit CUDA functions for PyTorch Rocm compatible☆39Updated last year
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆49Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆65Updated last year
- ☆37Updated last year
- A free AI text generation interface based on KoboldAI☆33Updated last year
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- Creates an Langchain Agent which uses the WebUI's API and Wikipedia to work☆74Updated last year
- An autonomous AI agent extension for Oobabooga's web ui☆175Updated last year
- 4 bits quantization of LLaMa using GPTQ☆130Updated last year
- Docker configuration for koboldcpp☆33Updated last year
- ☆55Updated last year
- Where we keep our notes about model training runs.☆16Updated 2 years ago
- Just a simple HowTo for https://github.com/johnsmith0031/alpaca_lora_4bit☆31Updated last year
- A fork of textgen that kept some things like Exllama and old GPTQ.☆22Updated 7 months ago
- A KoboldAI-like memory extension for oobabooga's text-generation-webui☆108Updated 5 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆104Updated 11 months ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- ☆156Updated last year
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆297Updated this week
- An extension for oobabooga's text-generation-webui that adds syntax highlighting to code snippets☆66Updated 9 months ago
- A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion.☆310Updated last year
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆36Updated last year
- A manual for helping using tesla p40 gpu☆121Updated 4 months ago
- Text WebUI extension to add clever Notebooks to Chat mode☆139Updated last year
- Wheels for llama-cpp-python compiled with cuBLAS support☆96Updated last year
- Memoir+ a persona memory extension for Text Gen Web UI.☆193Updated this week
- A community list of common phrases generated by GPT and Claude models☆78Updated last year
- Inference on CPU code for LLaMA models☆137Updated 2 years ago
- Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.☆71Updated 2 years ago