OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
☆511Feb 25, 2026Updated last week
Alternatives and similar repositories for vllm-mlx
Users that are interested in vllm-mlx are comparing it to the libraries listed below
Sorting:
- This repo maintains a 'cheat sheet' for LLMs that are undertrained on mlx☆18Mar 15, 2025Updated 11 months ago
- Context Query language for Agents☆45Feb 21, 2026Updated 2 weeks ago
- A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI…☆248Updated this week
- Tiny evaluation of leading LLMs on competitive programming problems☆14Nov 28, 2024Updated last year
- Fully automated memory and context management for Claude Code using hooks - Zero friction, zero context loss☆22Oct 22, 2025Updated 4 months ago
- Intelligent model orchestration for Claude Code - routes queries to optimal Claude model (Haiku/Sonnet/Opus) based on complexity. It also…☆31Jan 26, 2026Updated last month
- Variable manager for lorebook / world info entries.☆16Nov 29, 2023Updated 2 years ago
- A plugin/skill to search other plugins/skills☆18Feb 5, 2026Updated last month
- ☆18Aug 19, 2025Updated 6 months ago
- Agentic BYOK Browser-Based Website Builder☆30Updated this week
- Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon☆16May 8, 2025Updated 9 months ago
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- 🤖 Complete reproduction of 'AlphaGo Moment for Model Architecture Discovery' using MLX-LM instead of GPT-4. Autonomous neural architectu…☆27Jul 27, 2025Updated 7 months ago
- Fast parallel LLM inference for MLX☆247Jul 7, 2024Updated last year
- 🚀 SuperMCP - Create multiple isolated MCP servers using a single connector. Build powerful Model Context Protocol integrations for datab…☆53Jan 26, 2026Updated last month
- GenAI & agent toolkit for Apple Silicon Mac, implementing JSON schema-steered structured output (3SO) and tool-calling in Python. For mor…☆132Feb 27, 2026Updated last week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆2,212Updated this week
- FastMLX is a high performance production ready API to host MLX models.☆25Nov 18, 2024Updated last year
- ☆21Oct 9, 2024Updated last year
- Azure Active Directory Authentication plugin for ServiceStack☆19Apr 20, 2018Updated 7 years ago
- AI-free static security scanner for Claude Code artifacts (Skills, Hooks, MCP configs). Detects data exfiltration, prompt injection, and …☆17Updated this week
- MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2 Ultra, M3 Max).☆25Dec 16, 2023Updated 2 years ago
- Important Docker Commands Ordered by Compute (Services), Network, Storage then by Docker CLI, Dockerfile, Compose, and Swarm☆19Apr 18, 2023Updated 2 years ago
- Qwen Image models through MPS☆260Dec 31, 2025Updated 2 months ago
- MLX-GUI MLX Inference Server for Apple Silicone☆194Jan 13, 2026Updated last month
- Compression suite for data frames and tabular data files, csv, excel etc. Using LZHW algorithm.☆30Aug 17, 2024Updated last year
- Train Large Language Models on MLX.☆273Feb 27, 2026Updated last week
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆29May 6, 2025Updated 10 months ago
- Artificial Neural Engine Machine Learning Library☆1,351Feb 27, 2026Updated last week
- MLX native implementations of state-of-the-art generative image models☆1,847Feb 27, 2026Updated last week
- Instant Perfect Native MacOS Transcription☆52Jul 26, 2025Updated 7 months ago
- Something similar to Apple Intelligence?☆60Jul 3, 2024Updated last year
- Plugin QGIS☆10Jan 16, 2023Updated 3 years ago
- Run LLMs with MLX☆3,769Feb 28, 2026Updated last week
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆64Feb 19, 2026Updated 2 weeks ago
- A wannabe Ollama equivalent for Apple MlX models☆84Mar 2, 2025Updated last year
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆86Aug 20, 2025Updated 6 months ago
- Project-agnostic, composable configuration system for AI-assisted development workflows. Single source of truth for agentic tools (Claude…☆23Feb 24, 2026Updated last week
- Never lose context again with a persistent, queryable memory system for AI agents and development teams.☆18Jan 29, 2026Updated last month