waybarrios / vllm-mlxLinks

OpenAI-compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s.

☆237

Alternatives and similar repositories for vllm-mlx

Users that are interested in vllm-mlx are comparing it to the libraries listed below

Sorting:

rockbite / localforge
Local coding agent with neat UI
☆341Updated 8 months ago
RamboRogers / mlx-gui
MLX-GUI MLX Inference Server for Apple Silicone
☆176Updated 2 weeks ago
ivanfioravanti / qwen-image-mps
Qwen Image models through MPS
☆256Updated last month
rhulha / Speech2Speech
A web application that converts speech to speech 100% private
☆82Updated 8 months ago
universal-tool-calling-protocol / python-utcp
Official python implementation of UTCP. UTCP is an open standard that lets AI agents call any API directly, without extra middleware.
☆636Updated last month
cubist38 / mlx-openai-server
A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI…
☆203Updated this week
SearchSavior / OpenArc
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
☆290Updated this week
madroidmaq / mlx-omni-server
MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. I…
☆656Updated last month
nath1295 / MLX-Textgen
A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.
☆100Updated 7 months ago
babycommando / neuralgraffiti
Live-bending a foundation model’s output at neural network level.
☆273Updated 9 months ago
iluxu / llmbasedos
llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work
☆281Updated 3 weeks ago
ServiceStack / llms
LLM Client, Server API and UI
☆416Updated last week
arcee-ai / fastmlx
FastMLX is a high performance production ready API to host MLX models.
☆341Updated 10 months ago
dipampaul17 / KVSplit
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit …
☆363Updated 8 months ago
mzau / mlx-knife
ollama like cli tool for MLX models on huggingface (pull, rm, list, show, serve etc.)
☆126Updated last week
iblameandrew / local-deepthink
It takes a village to raise a child: Google DeepThink 🧠 but in LangGraph and free - an original algorithm for collaborative agents using…
☆134Updated 2 weeks ago
platinum-hill / cobolt
This is a cross-platform desktop application that allows you to chat with locally hosted LLMs and enjoy features like MCP support
☆227Updated 5 months ago
anurmatov / mac-studio-server
Optimized Ollama LLM server configuration for Mac Studio and other Apple Silicon Macs. Headless setup with automatic startup, resource op…
☆279Updated last week
sammcj / ingest
Parse files (e.g. code repos) and websites to clipboard or a file for ingestions by AI / LLMs
☆361Updated 2 weeks ago
senstella / parakeet-mlx
An implementation of the Nvidia's Parakeet models for Apple Silicon using MLX.
☆790Updated 3 weeks ago
kbrisso / byte-vision
Byte-Vision is a privacy-first document intelligence platform that transforms static documents into an interactive, searchable knowledge …
☆71Updated 2 months ago
monkesearch / monkeSearch
fully local, temporally aware natural language file search on your pc! even without a GPU. find relevant files using natural language i…
☆166Updated last month
Vaibhavs10 / experiments-with-mcp
☆100Updated 8 months ago
codelion / ellora
Enhancing LLMs with LoRA
☆206Updated 3 months ago
kalavai-net / kalavai-client
Aggregates compute from spare GPU capacity
☆189Updated this week
sam-paech / auto-antislop
☆108Updated 2 months ago
NPC-Worldwide / npcsh
the composable multi-agent shell
☆199Updated this week
Growth-Kinetics / DiffMem
Git Based Memory Storage for Conversational AI Agent
☆772Updated last week
leoheuler / flashtensors
☆440Updated last month
kspviswa / pyOllaMx
Your gateway to both Ollama & Apple MlX models
☆149Updated 11 months ago