nath1295 / MLX-TextgenLinks

A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.

☆98

Alternatives and similar repositories for MLX-Textgen

Users that are interested in MLX-Textgen are comparing it to the libraries listed below

Sorting:

mzbac / mlx_sharding
Distributed Inference for mlx LLm
☆98Updated last year
chimezie / mlx-tuning-fork
Very basic framework for composable parameterized large language model (Q)LoRA / (Q)Dora fine-tuning using mlx, mlx_lm, and OgbujiPT.
☆43Updated 4 months ago
mzbac / mlx-llm-server
For inferring and serving local LLMs using the MLX framework
☆109Updated last year
remichu-ai / gallama
☆133Updated 6 months ago
TheProxyCompany / proxy-structuring-engine
Guaranteed Structured Output from any Language Model via Hierarchical State Machines
☆145Updated last month
QuixiAI / dolphin-logger
☆107Updated 2 weeks ago
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆44Updated 2 weeks ago
severian42 / Computational-Model-for-Symbolic-Representations
Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …
☆53Updated 9 months ago
matteoserva / GraphLLM
☆207Updated 2 months ago
Goekdeniz-Guelmez / mlx-lm-lora
Train Large Language Models on MLX.
☆213Updated last week
rodrigobaron / anthill
☆24Updated 9 months ago
chigkim / Ollama-MMLU-Pro
☆106Updated 2 months ago
JosefAlbers / Phi-3-Vision-MLX
Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon
☆274Updated last week
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆227Updated last year
mzau / mlx-knife
ollama like cli tool for MLX models on huggingface (pull, rm, list, show, serve etc.)
☆113Updated last week
OoriData / Toolio
GenAI & agent toolkit for Apple Silicon Mac, implementing JSON schema-steered structured output (3SO) and tool-calling in Python. For mor…
☆128Updated last week
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆83Updated 2 weeks ago
abgulati / hf-waitress
Serving LLMs in the HF-Transformers format via a PyFlask API
☆71Updated last year
av / klmbr
klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs
☆84Updated last year
otriscon / llm-structured-output
☆89Updated 9 months ago
mark-lord / MLX-text-completion-notebook
A simple Jupyter Notebook for learning MLX text-completion fine-tuning!
☆122Updated last year
RandomInternetPreson / Lucid_Autonomy
An extension that lets the AI take the wheel, allowing it to use the mouse and keyboard, recognize UI elements, and prompt itself :3...no…
☆128Updated last year
SomeOddCodeGuy / OfflineWikipediaTextApi
This small API downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, taking in a query and returning full …
☆100Updated 2 months ago
armbues / SiLLM
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.
☆281Updated 5 months ago
gradion-ai / freeact
An AI agent library using Python as the common language to define executable actions and tool interfaces.
☆85Updated last week
Vaibhavs10 / experiments-with-mcp
☆101Updated 5 months ago
mark-lord / PromptExpander-Diffusionkit
A little file for doing LLM-assisted prompt expansion and image generation using Flux.schnell - complete with prompt history, prompt queu…
☆26Updated last year
N8python / mlx-pretrain
A simple MLX implementation for pretraining LLMs on Apple Silicon.
☆84Updated 2 months ago
monk1337 / auto-ollama
run ollama & gguf easily with a single command
☆52Updated last year
beratcmn / local-intelligence
Something similar to Apple Intelligence?
☆61Updated last year