fredlas / optimize_llamacpp_nglLinks
empirically chooses -ngl param for llama.cpp
☆16Updated 6 months ago
Alternatives and similar repositories for optimize_llamacpp_ngl
Users that are interested in optimize_llamacpp_ngl are comparing it to the libraries listed below
Sorting:
- ☆62Updated 2 months ago
- Analyze Reddit posts☆25Updated 6 months ago
- BlinkDL's RWKV-v4 running in the browser☆47Updated 2 years ago
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆24Updated last month
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 9 months ago
- A fast RWKV Tokenizer written in Rust☆53Updated last month
- An OpenAI API compatible LLM inference server based on ExLlamaV2.☆25Updated last year
- stable-diffusion.cpp bindings for python☆64Updated last week
- Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.☆38Updated last year
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆42Updated 2 weeks ago
- Deploy Apollo HF space locally☆40Updated 9 months ago
- Modified Beam Search with periodical restart☆12Updated last year
- Simple LLM inference server☆20Updated last year
- Game Companion AI is an advanced application designed to enhance the gaming experience by providing real-time analysis and interpretation…☆53Updated 11 months ago
- ☆23Updated 11 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆43Updated last year
- Generate Structured JSON with probs from Language Models☆17Updated 6 months ago
- Port of Facebook's LLaMA model in C/C++☆22Updated last year
- MilimoChat: Privacy-first, self-hosted AI chat with customizable personas, context-aware memory, and local analytics. Built on Python/Str…☆14Updated 6 months ago
- run ollama & gguf easily with a single command☆52Updated last year
- ☆42Updated 2 weeks ago
- cli tool to quantize gguf, gptq, awq, hqq and exl2 models☆75Updated 9 months ago
- Pressure testing the context window of open LLMs☆25Updated last year
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆50Updated 4 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆33Updated last week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆146Updated 7 months ago
- ☆24Updated 8 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆27Updated 6 months ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆57Updated 9 months ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆16Updated last year