A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.
☆301Apr 13, 2026Updated this week
Alternatives and similar repositories for mlx-openai-server
Users that are interested in mlx-openai-server are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. I…☆694Mar 10, 2026Updated last month
- ollama like cli tool for MLX models on huggingface (pull, rm, list, show, serve etc.)☆140Updated this week
- FastMLX is a high performance production ready API to host MLX models.☆352Mar 18, 2025Updated last year
- OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous bat…☆780Apr 1, 2026Updated 2 weeks ago
- High-performance MLX-based LLM inference engine for macOS with native Swift implementation☆544Apr 6, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Minimal Claude Code alternative powered by MLX☆46Jan 11, 2026Updated 3 months ago
- MLX-GUI MLX Inference Server for Apple Silicone☆208Apr 1, 2026Updated 2 weeks ago
- Train Large Language Models on MLX.☆291Mar 30, 2026Updated 2 weeks ago
- Real-time webcam demo with SmolVLM(mlx-community/SmolVLM-Instruct-4bit) and MLX-VLM☆25Jun 12, 2025Updated 10 months ago
- Run LLMs with MLX☆4,654Apr 8, 2026Updated last week
- ☆35Feb 14, 2026Updated 2 months ago
- Fast parallel LLM inference for MLX☆249Jul 7, 2024Updated last year
- Decision orchestration and reconciliation for AI changes.☆38Mar 30, 2026Updated 2 weeks ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆4,333Apr 9, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆20Oct 25, 2025Updated 5 months ago
- CLI tool for text to image generation using the FLUX.1 model.☆67Jun 28, 2025Updated 9 months ago
- Instant Perfect Native MacOS Transcription☆53Jul 26, 2025Updated 8 months ago
- A command-line utility to manage MLX models between your Hugging Face cache and LM Studio.☆84Nov 11, 2025Updated 5 months ago
- The open-source adapter for working with Kuzu databases and cypher queries in jupyter notebooks leveraging the yFiles Graphs for Jupyter …☆22Feb 3, 2026Updated 2 months ago
- Clone your friends with iMessage and MLX☆34Jan 9, 2024Updated 2 years ago
- A framework for building programmable applications☆29Jan 26, 2023Updated 3 years ago
- FastMLX is a high performance production ready API to host MLX models.☆25Nov 18, 2024Updated last year
- ☆22Dec 12, 2025Updated 4 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Zed extension for Exa's MCP server☆22Mar 11, 2026Updated last month
- Simple Tool Caller for llama.cpp☆11Aug 12, 2024Updated last year
- Fastest way to scaffold FastHTML applications.☆36Sep 13, 2025Updated 7 months ago
- dspy-cli is a tool for creating, developing, testing, and deploying DSPy programs as HTTP APIs.☆122Mar 3, 2026Updated last month
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆31Jan 23, 2026Updated 2 months ago
- Sample project for F5-TTS using MLX Swift☆51Jan 15, 2026Updated 3 months ago
- A pure MLX-based training pipeline for fine-tuning LLMs using GRPO on Apple Silicon.☆238Oct 28, 2025Updated 5 months ago
- Find the hidden meaning of LLMs☆41Nov 13, 2025Updated 5 months ago
- SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.☆284Jun 16, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Audio transcription using mlx whisper and vad silence processing☆17Oct 14, 2024Updated last year
- Qwen Image models through MPS☆265Dec 31, 2025Updated 3 months ago
- ☆15Feb 23, 2026Updated last month
- javascript multivariate data visualization☆14Jan 10, 2017Updated 9 years ago
- Openscad lib to improve 3D printed vertical holes☆14Nov 23, 2017Updated 8 years ago
- Implementation of ModernBERT in MLX☆20Jan 7, 2026Updated 3 months ago
- An input-component for controlling your app in natural language using an LLM though LangChain.dart☆14Nov 1, 2024Updated last year