intel / llm-scalerLinks
☆65Updated this week
Alternatives and similar repositories for llm-scaler
Users that are interested in llm-scaler are comparing it to the libraries listed below
Sorting:
- No-code CLI designed for accelerating ONNX workflows☆216Updated 4 months ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆564Updated last week
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆236Updated last week
- Intel® AI Assistant Builder☆117Updated last week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆82Updated last week
- LLM training in simple, raw C/HIP for AMD GPUs☆53Updated last year
- On-device LLM Inference Powered by X-Bit Quantization☆272Updated 3 months ago
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆83Updated this week
- Running Microsoft's BitNet via Electron, React & Astro☆46Updated last month
- Sparse Inferencing for transformer based LLMs☆201Updated 2 months ago
- LLM Inference on consumer devices☆125Updated 7 months ago
- High-Performance Text Deduplication Toolkit☆59Updated 2 months ago
- GPU Power and Performance Manager☆60Updated last year
- ☆52Updated last year
- Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration☆83Updated last week
- llama.cpp fork used by GPT4All☆57Updated 8 months ago
- llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work☆277Updated 2 months ago
- Enhancing LLMs with LoRA☆173Updated 2 weeks ago
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆403Updated this week
- LLM Ripper is a framework for component extraction (embeddings, attention heads, FFNs), activation capture, functional analysis, and adap…☆45Updated last week
- A platform to self-host AI on easy mode☆173Updated this week
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆51Updated 5 months ago
- The easiest & fastest way to run LLMs in your home lab☆71Updated 2 months ago
- ☆40Updated 2 months ago
- InferX: Inference as a Service Platform☆138Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- ☆49Updated last month
- Wraps any OpenAI API interface as Responses with MCPs support so it supports Codex. Adding any missing stateful features. Ollama and Vllm…☆124Updated 3 weeks ago
- fully local, temporally aware natural language file search on your pc! even without a GPU. find relevant files using natural language i…☆129Updated last month
- LlamaCards is a web application that provides a dynamic interface for interacting with LLM models in real-time. This app allows users to …☆39Updated last year