intel / llm-scalerLinks
☆43Updated this week
Alternatives and similar repositories for llm-scaler
Users that are interested in llm-scaler are comparing it to the libraries listed below
Sorting:
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆82Updated last week
- Lightweight Inference server for OpenVINO☆211Updated this week
- Intel® AI Assistant Builder☆106Updated this week
- LLM Ripper is a framework for component extraction (embeddings, attention heads, FFNs), activation capture, functional analysis, and adap…☆46Updated this week
- No-code CLI designed for accelerating ONNX workflows☆214Updated 3 months ago
- GPU Power and Performance Manager☆61Updated 11 months ago
- LlamaCards is a web application that provides a dynamic interface for interacting with LLM models in real-time. This app allows users to …☆39Updated last year
- Running Microsoft's BitNet via Electron, React & Astro☆44Updated 3 months ago
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆28Updated 4 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆42Updated 2 weeks ago
- NVIDIA Linux open GPU with P2P support☆54Updated this week
- Serving LLMs in the HF-Transformers format via a PyFlask API☆71Updated last year
- ☆100Updated last month
- Sparse Inferencing for transformer based LLMs☆198Updated last month
- Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowes…☆45Updated this week
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search …☆45Updated 3 weeks ago
- Autonomous, agentic, creative story writing system that incorporates stored embeddings and Knowledge Graphs.☆79Updated this week
- Locally running LLM with internet access☆96Updated 2 months ago
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆78Updated this week
- High-Performance Text Deduplication Toolkit☆56Updated last month
- Enhancing LLMs with LoRA☆137Updated 2 weeks ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆39Updated last year
- On-device LLM Inference Powered by X-Bit Quantization☆268Updated last month
- Fully Open Language Models with Stellar Performance☆247Updated last month
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆24Updated last month
- ☆85Updated last week
- InferX: Inference as a Service Platform☆135Updated this week
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆50Updated 4 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆33Updated last week
- Simple node proxy for llama-server that enables MCP use☆13Updated 4 months ago