turboderp-org / exllamav3Links

An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs

☆586

Alternatives and similar repositories for exllamav3

Users that are interested in exllamav3 are comparing it to the libraries listed below

Sorting:

theroyallab / tabbyAPI
The official API server for Exllama. OAI compatible, lightweight, and fast.
☆1,094Updated last week
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆1,341Updated last week
theroyallab / YALS
☆86Updated 2 weeks ago
matt-c1 / llama-3-quant-comparison
Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.
☆165Updated last year
turboderp-org / exui
Web UI for ExLlamaV2
☆514Updated 9 months ago
aphrodite-engine / aphrodite-engine
Large-scale LLM inference engine
☆1,600Updated last week
lmg-anon / mikupad
LLM Frontend in a single html file
☆669Updated 2 weeks ago
mostlygeek / llama-swap
Reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etc
☆1,933Updated last week
sam-paech / antislop-sampler
☆330Updated 4 months ago
jd-3d / SOLOBench
☆135Updated 7 months ago
leafspark / AutoGGUF
automatically quant GGUF models
☆217Updated last month
iuliaturc / gguf-docs
Docs for GGUF quantization (unofficial)
☆324Updated 4 months ago
LostRuins / datasetexplorer
Easily view and modify JSON datasets for large language models
☆84Updated 6 months ago
itsme2417 / PolyMind
A multimodal, function calling powered LLM webui.
☆217Updated last year
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆47Updated last month
epolewski / EricLLM
A fast batching API to serve LLM models
☆189Updated last year
FailSpy / abliterator
Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens
☆536Updated last year
CerebrasResearch / reap
REAP: Router-weighted Expert Activation Pruning for SMoE compression
☆129Updated 3 weeks ago
chigkim / Ollama-MMLU-Pro
☆107Updated 3 months ago
SearchSavior / OpenArc
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
☆254Updated this week
matatonic / openedai-vision
An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
☆266Updated 8 months ago
avarayr / suaveui
Open source LLM UI, compatible with all local LLM providers.
☆176Updated last year
TheProxyCompany / proxy-structuring-engine
Guaranteed Structured Output from any Language Model via Hierarchical State Machines
☆145Updated last month
SingularityMan / vector_companion
A local AI companion that uses a collection of free, open source AI models in order to create two virtual companions that will follow you…
☆236Updated last month
sasha0552 / nvidia-pstated
A daemon that automatically manages the performance states of NVIDIA GPUs.
☆100Updated last month
matteoserva / GraphLLM
☆209Updated 2 months ago
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆146Updated 4 months ago
Nexesenex / croco.cpp
Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…
☆153Updated this week
tdrussell / qlora-pipe
A pipeline parallel training script for LLMs.
☆163Updated 7 months ago
NimbleEdge / sparse_transformers
Sparse Inferencing for transformer based LLMs
☆215Updated 3 months ago