ubergarm / ik_llama.cppLinks
llama.cpp fork with additional SOTA quants and improved performance
☆21Updated this week
Alternatives and similar repositories for ik_llama.cpp
Users that are interested in ik_llama.cpp are comparing it to the libraries listed below
Sorting:
- ☆83Updated this week
- Lightweight Inference server for OpenVINO☆211Updated this week
- ☆223Updated 4 months ago
- Autonomous, agentic, creative story writing system that incorporates stored embeddings and Knowledge Graphs.☆78Updated this week
- ☆28Updated 3 months ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆493Updated this week
- A persistent local memory for AI, LLMs, or Copilot in VS Code.☆142Updated last week
- A local AI companion that uses a collection of free, open source AI models in order to create two virtual companions that will follow you…☆232Updated last month
- Open source LLM UI, compatible with all local LLM providers.☆174Updated 11 months ago
- ☆176Updated last week
- InferX is a Inference Function as a Service Platform☆133Updated this week
- A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.☆201Updated 4 months ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆27Updated 4 months ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆110Updated 2 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆82Updated this week
- Docs for GGUF quantization (unofficial)☆258Updated 2 months ago
- Orpheus Chat WebUI☆75Updated 5 months ago
- ☆165Updated last month
- Local LLM Powered Recursive Search & Smart Knowledge Explorer☆252Updated 7 months ago
- Convert downloaded Ollama models back into their GGUF equivalent format☆57Updated 8 months ago
- The Fastest Way to Fine-Tune LLMs Locally☆320Updated 5 months ago
- Efforts toward giving Qwen 3 Coder 30B A3B proper agentic tool calling capabilities at or near 100% reliability.☆60Updated last month
- automatically quant GGUF models☆200Updated this week
- ☆209Updated last week
- ☆50Updated 6 months ago
- A local front-end for open-weight LLMs with memory, RAG, TTS/STT, Elo ratings, and dynamic research tools. Built with React and FastAPI.☆37Updated last month
- Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowes…☆43Updated this week
- Eternal is an experimental platform for machine learning models and workflows.☆68Updated 6 months ago
- ☆133Updated 4 months ago
- Writing Extension for Text Generation WebUI☆63Updated last month