intel / llm-scalerLinks

☆78

Alternatives and similar repositories for llm-scaler

Users that are interested in llm-scaler are comparing it to the libraries listed below

Sorting:

SearchSavior / OpenArc
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
☆247Updated 3 weeks ago
intel / intel-ai-assistant-builder
Intel® AI Assistant Builder
☆128Updated this week
amd / fuzzyHSA
☆53Updated last year
ROCm / TheRock
The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm
☆580Updated this week
onnx / turnkeyml
No-code CLI designed for accelerating ONNX workflows
☆216Updated 5 months ago
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆84Updated last month
ROCm / aiter
AI Tensor Engine for ROCm
☆306Updated this week
spectral-compute / scale-examples
☆63Updated last year
amd / ryzen-ai-documentation
Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…
☆87Updated last week
lemonade-sdk / llamacpp-rocm
Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration
☆112Updated last week
electroglyph / quant_clone
Generate a llama-quantize command to copy the quantization parameters of any GGUF
☆27Updated 3 months ago
nomic-ai / llama.cpp
llama.cpp fork used by GPT4All
☆55Updated 9 months ago
huawei-csl / SINQ
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …
☆578Updated this week
qrv0 / llm-ripper
LLM Ripper is a framework for component extraction (embeddings, attention heads, FFNs), activation capture, functional analysis, and adap…
☆45Updated this week
ROCm / rocm-install-on-linux
☆34Updated last week
mlcommons / mlperf_client
MLPerf Client is a benchmark for Windows, Linux and macOS, focusing on client form factors in ML inference scenarios.
☆59Updated last week
amd / gaia
Run LLM Agents on Ryzen AI PCs in Minutes
☆766Updated last week
monkesearch / monkeSearch
fully local, temporally aware natural language file search on your pc! even without a GPU. find relevant files using natural language i…
☆147Updated 2 months ago
FastFlowLM / FastFlowLM
Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.
☆472Updated this week
AMD-AGI / Instella
Fully Open Language Models with Stellar Performance
☆303Updated 2 weeks ago
intel / xpumanager
☆140Updated last month
NimbleEdge / sparse_transformers
Sparse Inferencing for transformer based LLMs
☆213Updated 3 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆93Updated this week
Thireus / GGUF-Tool-Suite
Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowes…
☆65Updated last week
theroyallab / YALS
☆86Updated last week
iuliaturc / gguf-docs
Docs for GGUF quantization (unofficial)
☆319Updated 4 months ago
bold84 / cot_proxy
Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…
☆51Updated 6 months ago
akx / ollama-dl
Download models from the Ollama library, without Ollama
☆115Updated last year
pomoke / torch-apu-helper
Make PyTorch models at least run on APUs.
☆57Updated last year
inferx-net / inferx
InferX: Inference as a Service Platform
☆139Updated this week