intel / llm-scalerLinks
☆115Updated this week
Alternatives and similar repositories for llm-scaler
Users that are interested in llm-scaler are comparing it to the libraries listed below
Sorting:
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆270Updated last week
- No-code CLI designed for accelerating ONNX workflows☆221Updated 7 months ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆585Updated 2 weeks ago
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input your VRAM and RAM and the toolcha…☆76Updated this week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆690Updated this week
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆29Updated 5 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆21Updated 2 weeks ago
- Intel® AI Assistant Builder☆140Updated this week
- Lower Precision Floating Point Operations☆59Updated this week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆86Updated this week
- GPU Power and Performance Manager☆64Updated last year
- InferX: Inference as a Service Platform☆146Updated this week
- Sparse Inferencing for transformer based LLMs☆216Updated 5 months ago
- ☆87Updated last month
- The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-gra…☆41Updated this week
- Aggregates compute from spare GPU capacity☆183Updated last week
- AI Tensor Engine for ROCm☆330Updated this week
- Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration☆159Updated this week
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆591Updated last week
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆89Updated 3 weeks ago
- ☆53Updated last year
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆410Updated this week
- llama.cpp-gfx906☆82Updated this week
- ☆430Updated last month
- LLM Fine Tuning Toolbox images for Ryzen AI 395+ Strix Halo☆41Updated 3 months ago
- NVIDIA Linux open GPU with P2P support☆103Updated last month
- Make Qwen3 Think like Gemini 2.5 Pro | Open webui function☆25Updated 8 months ago
- llama.cpp fork used by GPT4All☆55Updated 10 months ago
- ☆48Updated 2 years ago
- From-scratch implementation of OpenAI's GPT-OSS model in Python. No Torch, No GPUs.☆107Updated 2 months ago