intel / llm-scalerLinks
☆151Updated this week
Alternatives and similar repositories for llm-scaler
Users that are interested in llm-scaler are comparing it to the libraries listed below
Sorting:
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆295Updated this week
- No-code CLI designed for accelerating ONNX workflows☆227Updated 7 months ago
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input your VRAM and RAM and the toolcha…☆76Updated this week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Updated last week
- Lower Precision Floating Point Operations☆66Updated last month
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆590Updated 3 weeks ago
- Intel® AI Super Builder☆159Updated this week
- GPU Power and Performance Manager☆66Updated last year
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆770Updated this week
- NVIDIA Linux open GPU with P2P support☆129Updated this week
- ☆90Updated 2 months ago
- Sparse Inferencing for transformer based LLMs☆217Updated 5 months ago
- InferX: Inference as a Service Platform☆156Updated this week
- ☆83Updated 3 weeks ago
- Aggregates compute from spare GPU capacity☆190Updated last week
- llama.cpp fork with additional SOTA quants and improved performance☆1,605Updated this week
- automatically quant GGUF models☆219Updated last month
- A curated list of OpenVINO based AI projects☆181Updated 7 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆21Updated this week
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆50Updated 8 months ago
- German "Who Wants To Be A Millionaire" LLM Benchmarking.☆48Updated last month
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆92Updated last week
- From-scratch implementation of OpenAI's GPT-OSS model in Python. No Torch, No GPUs.☆108Updated 3 months ago
- Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration☆187Updated this week
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆30Updated 2 weeks ago
- High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model di…☆137Updated last week
- AI Tensor Engine for ROCm☆351Updated this week
- ☆151Updated this week
- Make Qwen3 Think like Gemini 2.5 Pro | Open webui function☆25Updated 9 months ago
- ☆128Updated last year