waybarrios / vllm-mlxLinks
OpenAI-compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s.
☆237Updated this week
Alternatives and similar repositories for vllm-mlx
Users that are interested in vllm-mlx are comparing it to the libraries listed below
Sorting:
- Local coding agent with neat UI☆341Updated 8 months ago
- MLX-GUI MLX Inference Server for Apple Silicone☆176Updated 2 weeks ago
- Qwen Image models through MPS☆256Updated last month
- A web application that converts speech to speech 100% private☆82Updated 8 months ago
- Official python implementation of UTCP. UTCP is an open standard that lets AI agents call any API directly, without extra middleware.☆636Updated last month
- A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI…☆203Updated this week
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆290Updated this week
- MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. I…☆656Updated last month
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆100Updated 7 months ago
- Live-bending a foundation model’s output at neural network level.☆273Updated 9 months ago
- llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work☆281Updated 3 weeks ago
- LLM Client, Server API and UI☆416Updated last week
- FastMLX is a high performance production ready API to host MLX models.☆341Updated 10 months ago
- Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit …☆363Updated 8 months ago
- ollama like cli tool for MLX models on huggingface (pull, rm, list, show, serve etc.)☆126Updated last week
- It takes a village to raise a child: Google DeepThink 🧠 but in LangGraph and free - an original algorithm for collaborative agents using…☆134Updated 2 weeks ago
- This is a cross-platform desktop application that allows you to chat with locally hosted LLMs and enjoy features like MCP support☆227Updated 5 months ago
- Optimized Ollama LLM server configuration for Mac Studio and other Apple Silicon Macs. Headless setup with automatic startup, resource op…☆279Updated last week
- Parse files (e.g. code repos) and websites to clipboard or a file for ingestions by AI / LLMs☆361Updated 2 weeks ago
- An implementation of the Nvidia's Parakeet models for Apple Silicon using MLX.☆790Updated 3 weeks ago
- Byte-Vision is a privacy-first document intelligence platform that transforms static documents into an interactive, searchable knowledge …☆71Updated 2 months ago
- fully local, temporally aware natural language file search on your pc! even without a GPU. find relevant files using natural language i…☆166Updated last month
- ☆100Updated 8 months ago
- Enhancing LLMs with LoRA☆206Updated 3 months ago
- Aggregates compute from spare GPU capacity☆189Updated this week
- ☆108Updated 2 months ago
- the composable multi-agent shell☆199Updated this week
- Git Based Memory Storage for Conversational AI Agent☆772Updated last week
- ☆440Updated last month
- Your gateway to both Ollama & Apple MlX models☆149Updated 11 months ago