willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆118Updated 2 months ago
Related projects: ⓘ
- 1.58 Bit LLM on Apple Silicon using MLX☆96Updated 4 months ago
- For inferring and serving local LLMs using the MLX framework☆77Updated 5 months ago
- Start a server from the MLX library.☆157Updated last month
- FastMLX is a high performance production ready API to host MLX models.☆163Updated last week
- Scripts to create your own moe models using mlx☆86Updated 6 months ago
- MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.☆187Updated this week
- Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon☆206Updated last week
- an implementation of Self-Extend, to expand the context window via grouped attention☆117Updated 8 months ago
- ☆109Updated last month
- ☆101Updated 5 months ago
- A simple UI / Web / Frontend for MLX mlx-lm using Streamlit.☆219Updated 2 months ago
- ☆64Updated 3 months ago
- ☆36Updated 6 months ago
- Just a bunch of benchmark logs for different LLMs☆112Updated last month
- run embeddings in MLX☆68Updated last month
- ☆144Updated 2 months ago
- Distributed Inference for mlx LLm☆57Updated last month
- MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.☆60Updated 3 weeks ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆223Updated 4 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆260Updated last month
- ☆101Updated this week
- run paligemma in real time☆122Updated 4 months ago
- ☆75Updated 3 weeks ago
- Explore a simple example of utilizing MLX for RAG application running locally on your Apple Silicon device.☆144Updated 7 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆158Updated 2 months ago
- 1.58-bit LLaMa model☆77Updated 5 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆217Updated 6 months ago
- 🤖 Headless IDE for AI agents☆110Updated this week
- Low-Rank adapter extraction for fine-tuned transformers model☆154Updated 4 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆192Updated 4 months ago