intel / llm-scalerLinks
☆49Updated last week
Alternatives and similar repositories for llm-scaler
Users that are interested in llm-scaler are comparing it to the libraries listed below
Sorting:
- No-code CLI designed for accelerating ONNX workflows☆214Updated 4 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS over OpenAI endpoints.☆213Updated last week
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆504Updated this week
- Intel® AI Assistant Builder☆111Updated this week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆81Updated this week
- LLM Ripper is a framework for component extraction (embeddings, attention heads, FFNs), activation capture, functional analysis, and adap…☆45Updated last week
- A curated list of OpenVINO based AI projects☆162Updated 3 months ago
- Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration☆74Updated this week
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆82Updated last week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆450Updated last week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆90Updated last week
- Make PyTorch models at least run on APUs.☆56Updated last year
- GPU Power and Performance Manager☆61Updated last year
- InferX: Inference as a Service Platform☆136Updated last week
- Running Microsoft's BitNet via Electron, React & Astro☆45Updated 3 weeks ago
- ☆97Updated last month
- ☆83Updated 2 weeks ago
- Run LLM Agents on Ryzen AI PCs in Minutes☆649Updated last week
- Serving LLMs in the HF-Transformers format via a PyFlask API☆71Updated last year
- NVIDIA Linux open GPU with P2P support☆60Updated last week
- German "Who Wants To Be A Millionaire" LLM Benchmarking.☆45Updated last week
- ☆52Updated last year
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆321Updated last week
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search …☆46Updated last month
- LLM Inference on consumer devices☆124Updated 7 months ago
- The easiest & fastest way to run LLMs in your home lab☆69Updated last month
- Simple system tray application to monitor the status of your LLM models running on Ollama☆23Updated 3 months ago
- VSCode AI coding assistant powered by self-hosted llama.cpp endpoint.☆183Updated 8 months ago
- llama.cpp fork used by GPT4All☆57Updated 8 months ago
- AMD related optimizations for transformer models☆90Updated last month