WapaMario63 / GPTQ-for-LLaMa-ROCm
4 bits quantization of LLaMA using GPTQ, ported to HIP for use in AMD GPUs.
☆32Updated last year
Alternatives and similar repositories for GPTQ-for-LLaMa-ROCm:
Users that are interested in GPTQ-for-LLaMa-ROCm are comparing it to the libraries listed below
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆48Updated last year
- A fork of textgen that kept some things like Exllama and old GPTQ.☆22Updated 6 months ago
- DEPRECATED!☆53Updated 8 months ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- No-messing-around sh client for llama.cpp's server☆31Updated 6 months ago
- 4 bits quantization of LLaMa using GPTQ☆131Updated last year
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- Wheels for llama-cpp-python compiled with cuBLAS support☆94Updated last year
- Creates an Langchain Agent which uses the WebUI's API and Wikipedia to work☆73Updated last year
- A KoboldAI-like memory extension for oobabooga's text-generation-webui☆108Updated 3 months ago
- Simple and fast server for GPTQ-quantized LLaMA inference☆24Updated last year
- An autonomous AI agent extension for Oobabooga's web ui☆176Updated last year
- RAG implementation for Ooba characters. dynamically spins up new qdrant vector DB and manages retrieval and commits for conversations ba…☆46Updated last year
- 8-bit CUDA functions for PyTorch Rocm compatible☆39Updated 10 months ago
- Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts☆111Updated last year
- Experimental LLM Inference UX to aid in creative writing☆112Updated 2 months ago
- ☆28Updated last year
- Host the GPTQ model using AutoGPTQ as an API that is compatible with text generation UI API.☆91Updated last year
- oobaboga -text-generation-webui implementation of wafflecomposite - langchain-ask-pdf-local☆69Updated last year
- A simple webui for stable-diffusion.cpp☆24Updated this week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆53Updated this week
- Web UI for ExLlamaV2☆480Updated 2 weeks ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆65Updated last year
- GPU Power and Performance Manager☆55Updated 4 months ago
- ☆37Updated last year
- A fast batching API to serve LLM models☆180Updated 9 months ago
- A prompt/context management system☆169Updated last year
- ☆153Updated last year
- Extension for Text Generation Webui based on EdgeGPT, a reverse engineered API of Microsoft's Bing Chat AI☆125Updated last year
- Generate Large Language Model text in a container.☆20Updated last year