WapaMario63 / GPTQ-for-LLaMa-ROCm
4 bits quantization of LLaMA using GPTQ, ported to HIP for use in AMD GPUs.
☆32Updated last year
Related projects ⓘ
Alternatives and complementary repositories for GPTQ-for-LLaMa-ROCm
- DEPRECATED!☆53Updated 5 months ago
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆44Updated last year
- A zero dependency web UI for any LLM backend, including KoboldCpp, OpenAI and AI Horde☆81Updated this week
- A free AI text generation interface based on KoboldAI☆32Updated 8 months ago
- ☆150Updated last year
- Add-on for the Web Search extension that provides the web browsing capabilities without the need for Extras API.☆24Updated 6 months ago
- An extension for oobabooga's text-generation-webui that adds syntax highlighting to code snippets☆64Updated 5 months ago
- A prompt/context management system☆165Updated last year
- A KoboldAI-like memory extension for oobabooga's text-generation-webui☆107Updated 3 weeks ago
- Creates an Langchain Agent which uses the WebUI's API and Wikipedia to work☆73Updated last year
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆173Updated last month
- Web UI for ExLlamaV2☆445Updated last month
- An unsupervised model merging algorithm for Transformers-based language models.☆100Updated 6 months ago
- Dynamic parameter modulation for oobabooga's text-generation-webui that adjusts generation parameters to better mirror user affect.☆34Updated last year
- Wheels for llama-cpp-python compiled with cuBLAS support☆94Updated 9 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆124Updated last year
- 8-bit CUDA functions for PyTorch Rocm compatible☆39Updated 7 months ago
- An extension to Oobabooga to add a simple memory function for chat☆23Updated last year
- 100% Private & Simple. OSS 🐍 Code Interpreter for LLMs 🦙☆34Updated last year
- A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.☆42Updated 9 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆40Updated last month
- CHAracter State Management - a generative text adventure (engine)☆61Updated last month
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆126Updated 6 months ago
- Text WebUI extension to add clever Notebooks to Chat mode☆133Updated 10 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆66Updated last year
- Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts☆111Updated last year
- Diffusion_TTS extension for booga☆63Updated 4 months ago
- 5X faster 60% less memory QLoRA finetuning☆21Updated 5 months ago
- 4 bits quantization of LLaMa using GPTQ☆130Updated last year
- HTTP proxy for on-demand model loading with llama.cpp (or other OpenAI compatible backends)☆41Updated this week