jllllll / exllamaLinks

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

☆63

Alternatives and similar repositories for exllama

Users that are interested in exllama are comparing it to the libraries listed below

Sorting:

eugenepentland / landmark-attention-qlora
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
☆123Updated 2 years ago
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆247Updated last year
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆106Updated last year
TheBlokeAI / AIScripts
Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub
☆160Updated 2 years ago
the-crypt-keeper / the-muse
Experimental sampler to make LLMs more creative
☆31Updated 2 years ago
aigoopy / llm-jeopardy
Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
☆108Updated 2 years ago
danikhan632 / guidance_api
An Extension for oobabooga/text-generation-webui
☆36Updated 2 years ago
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆177Updated last year
kaiokendev / superbig
A prompt/context management system
☆170Updated 2 years ago
emrgnt-cmplxty / zero-shot-replication
☆73Updated 2 years ago
Gryphe / BlockMerge_Gradient
Merge Transformers language models by use of gradient parameters.
☆207Updated last year
taprosoft / llm_finetuning
Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…
☆146Updated 2 years ago
jllllll / llama-cpp-python-cuBLAS-wheels
Wheels for llama-cpp-python compiled with cuBLAS support
☆97Updated last year
theroyallab / llm-prompt-templates
Prompt Jinja2 templates for LLMs
☆34Updated 3 months ago
PygmalionAI / training-code
The code we currently use to fine-tune models.
☆116Updated last year
s4rduk4r / alpaca_lora_4bit_readme
Just a simple HowTo for https://github.com/johnsmith0031/alpaca_lora_4bit
☆31Updated 2 years ago
mayank31398 / GPTQ-for-SantaCoder
4 bits quantization of SantaCoder using GPTQ
☆50Updated 2 years ago
VatsaDev / NanoPhi-alpha
GPT-2 small trained on phi-like data
☆67Updated last year
Hellisotherpeople / llm_steer-oobabooga
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…
☆42Updated last year
QuixiAI / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆239Updated last year
CharlesMod / quantizeHFmodel
Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.
☆38Updated last year
bjj / exllamav2-openai-server
An OpenAI API compatible LLM inference server based on ExLlamaV2.
☆25Updated last year
OpenAccess-AI-Collective / ggml-webui
Deploy your GGML models to HuggingFace Spaces with Docker and gradio
☆37Updated 2 years ago
AlpinDale / sparsegpt-for-LLaMA
Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
☆70Updated 2 years ago
Dhaladom / TALIS
Simple and fast server for GPTQ-quantized LLaMA inference
☆24Updated 2 years ago
epolewski / EricLLM
A fast batching API to serve LLM models
☆188Updated last year
zarakiquemparte / zaraki-tools
☆26Updated 2 years ago
thooton / muse
Let's create synthetic textbooks together :)
☆75Updated last year
CoffeeVampir3 / ez-trainer
Train Llama Loras Easily
☆30Updated 2 years ago
TheBloke / AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆37Updated last year