intel / llm-scalerLinks
☆144Updated last week
Alternatives and similar repositories for llm-scaler
Users that are interested in llm-scaler are comparing it to the libraries listed below
Sorting:
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆295Updated this week
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input your VRAM and RAM and the toolcha…☆76Updated this week
- Lower Precision Floating Point Operations☆66Updated last month
- No-code CLI designed for accelerating ONNX workflows☆227Updated 7 months ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆590Updated 3 weeks ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Updated last week
- InferX: Inference as a Service Platform☆156Updated this week
- NVIDIA Linux open GPU with P2P support☆129Updated this week
- GPU Power and Performance Manager☆66Updated last year
- ☆83Updated 3 weeks ago
- ☆90Updated 2 months ago
- Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration☆187Updated this week
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆92Updated this week
- Sparse Inferencing for transformer based LLMs☆217Updated 5 months ago
- Aggregates compute from spare GPU capacity☆190Updated last week
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆50Updated 8 months ago
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆770Updated this week
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆30Updated 2 weeks ago
- The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-gra…☆55Updated last week
- ☆53Updated last year
- automatically quant GGUF models☆219Updated last month
- AI Tensor Engine for ROCm☆351Updated this week
- High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model di…☆137Updated last week
- ☆51Updated 2 months ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆30Updated 8 months ago
- Get aid from local LLMs right in your PowerShell☆15Updated 9 months ago
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆707Updated this week
- llama.cpp-gfx906☆90Updated this week
- Docs for GGUF quantization (unofficial)☆366Updated 6 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆21Updated this week