ubergarm / ik_llama.cppLinks

llama.cpp fork with additional SOTA quants and improved performance

☆21

Alternatives and similar repositories for ik_llama.cpp

Users that are interested in ik_llama.cpp are comparing it to the libraries listed below

Sorting:

theroyallab / YALS
☆90Updated last month
turboderp-org / exllamav3
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
☆626Updated last week
Thireus / GGUF-Tool-Suite
Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input your VRAM and RAM and the toolcha…
☆76Updated this week
SlerpE / highCompute.py
☆27Updated 7 months ago
Lanerra / saga
Autonomous, agentic, creative story writing system that incorporates stored embeddings and Knowledge Graphs.
☆92Updated this week
SearchSavior / OpenArc
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
☆295Updated this week
Viceman256 / TensorTune
KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning
☆30Updated 8 months ago
boneylizard / Eloquent
The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-gra…
☆55Updated this week
atineiatte / deep-research-at-home
☆230Updated 9 months ago
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆88Updated last week
savantskie / persistent-ai-memory
A persistent local memory for AI, LLMs, or Copilot in VS Code.
☆191Updated 3 months ago
TesslateAI / Agent-Builder
☆205Updated 5 months ago
masterFoad / NanoSage
Local LLM Powered Recursive Search & Smart Knowledge Explorer
☆260Updated 3 months ago
Nexesenex / croco.cpp
Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…
☆156Updated this week
PkmX / orpheus-chat-webui
Orpheus Chat WebUI
☆76Updated 10 months ago
TesslateAI / TFrameX
☆178Updated 5 months ago
wsmlby / homl
The easiest & fastest way to run LLMs in your home lab
☆80Updated 2 months ago
mattjamo / OllamaToGGUF
Convert downloaded Ollama models back into their GGUF equivalent format
☆71Updated last year
monkesearch / monkeSearch
fully local, temporally aware natural language file search on your pc! even without a GPU. find relevant files using natural language i…
☆166Updated last month
rombodawg / Easy_training
☆51Updated 11 months ago
SingularityMan / vector_companion
A local AI companion that uses a collection of free, open source AI models in order to create two virtual companions that will follow you…
☆240Updated 3 months ago
k-koehler / gguf-tensor-overrider
☆51Updated 3 months ago
chigkim / Ollama-MMLU-Pro
☆109Updated 5 months ago
leafspark / AutoGGUF
automatically quant GGUF models
☆219Updated last month
akashjss / sesame-csm
A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.
☆211Updated 9 months ago
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆157Updated 7 months ago
cduk / vllm-pascal
A fork of vLLM enabling Pascal architecture GPUs
☆32Updated 11 months ago
avarayr / suaveui
Open source LLM UI, compatible with all local LLM providers.
☆177Updated last year
inferx-net / inferx
InferX: Inference as a Service Platform
☆156Updated this week
iuliaturc / gguf-docs
Docs for GGUF quantization (unofficial)
☆366Updated 6 months ago