OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
☆780Apr 1, 2026Updated last week
Alternatives and similar repositories for vllm-mlx
Users that are interested in vllm-mlx are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI…☆290Updated this week
- Agentic BYOK Browser-Based Website Builder☆39Updated this week
- This repo maintains a 'cheat sheet' for LLMs that are undertrained on mlx☆32Mar 12, 2026Updated 3 weeks ago
- Voxel-based Editor☆13Jul 11, 2018Updated 7 years ago
- ☆13Jan 25, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- Tiny evaluation of leading LLMs on competitive programming problems☆14Nov 28, 2024Updated last year
- Context Query language for Agents☆63Mar 29, 2026Updated last week
- Fast parallel LLM inference for MLX☆249Jul 7, 2024Updated last year
- Character-level conversion between Hebrew text and Latin transliteration using deep learning - a demonstration of seq2seq training.☆14Jun 27, 2023Updated 2 years ago
- MLX native implementations of state-of-the-art generative image models☆1,961Mar 28, 2026Updated last week
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆65Mar 30, 2026Updated last week
- ☆72Apr 11, 2025Updated 11 months ago
- 🚀 SuperMCP - Create multiple isolated MCP servers using a single connector. Build powerful Model Context Protocol integrations for datab…☆55Jan 26, 2026Updated 2 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆4,138Updated this week
- Train Large Language Models on MLX.☆287Mar 30, 2026Updated last week
- 🤖 Complete reproduction of 'AlphaGo Moment for Model Architecture Discovery' using MLX-LM instead of GPT-4. Autonomous neural architectu…☆27Jul 27, 2025Updated 8 months ago
- Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon☆16May 8, 2025Updated 11 months ago
- ☆15Feb 23, 2026Updated last month
- Semantic memory system for Claude Code - provides persistent conversation memory through vector search of session summaries☆32Jul 21, 2025Updated 8 months ago
- MLX-GUI MLX Inference Server for Apple Silicone☆207Apr 1, 2026Updated last week
- A simple, observable code-writing agent builder in TypeScript.☆32Apr 9, 2025Updated last year
- FastMLX is a high performance production ready API to host MLX models.☆25Nov 18, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- GitHub to Karakeep exporter.☆37Feb 3, 2026Updated 2 months ago
- ☆21Oct 9, 2024Updated last year
- Implementation of ModernBERT in MLX☆20Jan 7, 2026Updated 3 months ago
- ☆12Oct 17, 2022Updated 3 years ago
- Artificial Neural Engine Machine Learning Library☆1,552Mar 10, 2026Updated 3 weeks ago
- GPT-4 を用いて、言語モデルの応答を自動評価するスクリプト☆16Jun 6, 2024Updated last year
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆33Oct 30, 2025Updated 5 months ago
- Run LLMs with MLX☆4,373Apr 3, 2026Updated last week
- Fully automated memory and context management for Claude Code using hooks - Zero friction, zero context loss☆26Oct 22, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Anomaly detection using RAG☆17Apr 22, 2024Updated last year
- Instant Perfect Native MacOS Transcription☆53Jul 26, 2025Updated 8 months ago
- A repo of useful MLX skills.☆80Jan 25, 2026Updated 2 months ago
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆29May 6, 2025Updated 11 months ago
- Command-line interface for Chrome automation using DevTools Protocol☆19Jun 20, 2025Updated 9 months ago
- On-device semantic search over Apple WWDC 2025 docs using MLX embeddings — SwiftUI app (WWDC OMT 2025)☆76Jun 12, 2025Updated 9 months ago
- 🎩 AIfred - Multi-Agent AI Assistant with TTS, RAG, Web Research & Voice capabilities. Supports Ollama, vLLM, TabbyAPI. Features AIfred/S…☆26Updated this week