intel / llm-scalerLinks
☆140Updated this week
Alternatives and similar repositories for llm-scaler
Users that are interested in llm-scaler are comparing it to the libraries listed below
Sorting:
- No-code CLI designed for accelerating ONNX workflows☆226Updated 7 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆283Updated last week
- Lower Precision Floating Point Operations☆65Updated 3 weeks ago
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆91Updated last week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Updated this week
- Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration☆176Updated this week
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input your VRAM and RAM and the toolcha…☆76Updated this week
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆588Updated 2 weeks ago
- NVIDIA Linux open GPU with P2P support☆119Updated last month
- ☆53Updated last year
- Sparse Inferencing for transformer based LLMs☆218Updated 5 months ago
- A curated list of OpenVINO based AI projects☆179Updated 7 months ago
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆28Updated last week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆742Updated last week
- AI Tensor Engine for ROCm☆344Updated this week
- Build AI agents for your PC☆894Updated last week
- Fully Open Language Models with Stellar Performance☆317Updated 2 months ago
- AMD related optimizations for transformer models☆97Updated 3 months ago
- Intel® AI Super Builder☆153Updated this week
- ☆36Updated this week
- ☆75Updated last week
- llama.cpp fork used by GPT4All☆55Updated 11 months ago
- ☆51Updated last month
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆665Updated last week
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆50Updated 8 months ago
- ☆135Updated last week
- From-scratch implementation of OpenAI's GPT-OSS model in Python. No Torch, No GPUs.☆108Updated 2 months ago
- Developer kits reference setup scripts for various kinds of Intel platforms and GPUs☆41Updated this week
- 8-bit CUDA functions for PyTorch☆70Updated 4 months ago
- High-speed and easy-use LLM serving framework for local deployment☆144Updated 5 months ago