mzbac / mlx_sharding
Distributed Inference for mlx LLm
☆79Updated 5 months ago
Alternatives and similar repositories for mlx_sharding:
Users that are interested in mlx_sharding are comparing it to the libraries listed below
- ☆109Updated last month
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆70Updated last month
- Scripts to create your own moe models using mlx☆86Updated 11 months ago
- ☆65Updated 8 months ago
- Implementation of nougat that focuses on processing pdf locally.☆75Updated 2 weeks ago
- ☆151Updated 6 months ago
- Fast parallel LLM inference for MLX☆153Updated 6 months ago
- For inferring and serving local LLMs using the MLX framework☆91Updated 10 months ago
- Official homepage for "Self-Harmonized Chain of Thought"☆89Updated last week
- ☆38Updated 10 months ago
- Large Model Proxy is designed to make it easy to run multiple resource-heavy Large Models (LM) on the same machine with limited amount of…☆49Updated 3 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆67Updated 4 months ago
- Embed anything.☆28Updated 8 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆40Updated 2 weeks ago
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆114Updated 8 months ago
- ☆70Updated this week
- Routing on Random Forest (RoRF)☆100Updated 4 months ago
- ☆121Updated last week
- ☆74Updated last month
- ☆28Updated 10 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆62Updated 2 months ago
- Simple examples using Argilla tools to build AI☆52Updated 2 months ago
- auto fine tune of models with synthetic data☆74Updated 11 months ago
- All the world is a play, we are but actors in it.☆47Updated this week
- Testing LLM reasoning abilities with family relationship quizzes.☆57Updated this week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆120Updated this week
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆81Updated last month
- tiny_fnc_engine is a minimal python library that provides a flexible engine for calling functions extracted from a LLM.☆38Updated 4 months ago
- Minimal, clean code implementation of RAG with mlx using gguf model weights☆46Updated 9 months ago