nath1295 / MLX-Textgen
A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.
☆72Updated 2 months ago
Alternatives and similar repositories for MLX-Textgen:
Users that are interested in MLX-Textgen are comparing it to the libraries listed below
- ☆24Updated 3 weeks ago
- ☆38Updated 11 months ago
- Very basic framework for parameterized large language model (Q)LoRA / (Q)Dora fine-tuning using mlx, mlx_lm, and OgbujiPT. Architecture …☆37Updated this week
- Distributed Inference for mlx LLm☆82Updated 6 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆46Updated last month
- For inferring and serving local LLMs using the MLX framework☆94Updated 10 months ago
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆114Updated 8 months ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆55Updated 2 months ago
- tiny_fnc_engine is a minimal python library that provides a flexible engine for calling functions extracted from a LLM.☆38Updated 5 months ago
- ☆31Updated last year
- Scripts to create your own moe models using mlx☆86Updated 11 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆68Updated 4 months ago
- ☆29Updated 2 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆89Updated 3 weeks ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆53Updated this week
- MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.☆96Updated 4 months ago
- This project is a reverse-engineered version of Figma's tone changer. It uses Groq's Llama-3-8b for high-speed inference and to adjust th…☆88Updated 6 months ago
- Experimental LLM Inference UX to aid in creative writing☆112Updated 2 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 3 months ago
- Easily view and modify JSON datasets for large language models☆70Updated last week
- Embed anything.☆29Updated 8 months ago
- A Python library to orchestrate LLMs in a neural network-inspired structure☆46Updated 4 months ago
- Grammar checker with a keyboard shortcut for Ollama and Apple MLX with Automator on macOS.☆77Updated last year
- Serving LLMs in the HF-Transformers format via a PyFlask API☆69Updated 5 months ago
- a lightweight, open-source blueprint for building powerful and scalable LLM chat applications☆30Updated 8 months ago
- A little file for doing LLM-assisted prompt expansion and image generation using Flux.schnell - complete with prompt history, prompt queu…☆26Updated 6 months ago
- ☆111Updated 2 months ago
- Fast parallel LLM inference for MLX☆163Updated 7 months ago