jllllll / GPTQ-for-LLaMa-Wheels
Precompiled Wheels for GPTQ-for-LLaMa
☆18Updated last year
Alternatives and similar repositories for GPTQ-for-LLaMa-Wheels:
Users that are interested in GPTQ-for-LLaMa-Wheels are comparing it to the libraries listed below
- A KoboldAI-like memory extension for oobabooga's text-generation-webui☆108Updated 5 months ago
- A combination of Oobabooga's fork and the main cuda branch of GPTQ-for-LLaMa in a package format.☆22Updated last year
- ☆27Updated last year
- 4 bits quantization of LLMs using GPTQ☆49Updated last year
- Oobabooga extension for Bark TTS☆118Updated last year
- Train Llama Loras Easily☆31Updated last year
- Just a simple HowTo for https://github.com/johnsmith0031/alpaca_lora_4bit☆31Updated last year
- oobabooga extension - Experimental sampler to make LLMs more creative☆23Updated last year
- A simple extension that uses Bark Text-to-Speech for audio output☆35Updated last year
- 8-bit CUDA functions for PyTorch☆44Updated last year
- 4 bits quantization of LLaMa using GPTQ☆130Updated last year
- Traing PRO extension for oobabooga WebUI - recent dev version☆48Updated 3 months ago
- Text WebUI extension to add clever Notebooks to Chat mode☆139Updated last year
- Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.☆38Updated last year
- A simple converter which converts pytorch bin files to safetensor, intended to be used for LLM conversion.☆65Updated last year
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆46Updated last year
- A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion.☆310Updated last year
- Fast and memory-efficient exact attention - Windows wheels☆33Updated last year
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated 2 years ago
- Efficient 3bit/4bit quantization of LLaMA models☆19Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- Instruct-tune LLaMA on consumer hardware☆74Updated last year
- An OpenAI API compatible LLM inference server based on ExLlamaV2.☆25Updated last year
- Model REVOLVER, a human in the loop model mixing system.☆33Updated last year
- annoy long term memory experiment for oobabooga/text-generation-webui☆31Updated last year
- An unsupervised model merging algorithm for Transformers-based language models.☆105Updated 11 months ago
- ChatGPT-like Web UI for RWKVstic☆100Updated 2 years ago
- XTTSv2 Extension for oobabooga text-generation-webui☆152Updated last year
- Make abliterated models with transformers, easy and fast☆67Updated last week