exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆204Updated 11 months ago
Alternatives and similar repositories for mlx-bitnet:
Users that are interested in mlx-bitnet are comparing it to the libraries listed below
- Distributed Inference for mlx LLm☆89Updated 9 months ago
- Fast parallel LLM inference for MLX☆186Updated 10 months ago
- SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.☆264Updated this week
- Scripts to create your own moe models using mlx☆89Updated last year
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆73Updated this week
- 1.58-bit LLaMa model☆81Updated last year
- ☆154Updated 9 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 6 months ago
- Inference of Mamba models in pure C☆188Updated last year
- Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) + MPS and CUDA.☆176Updated 3 weeks ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 6 months ago
- Experimental BitNet Implementation☆65Updated last year
- ☆129Updated 8 months ago
- For inferring and serving local LLMs using the MLX framework☆103Updated last year
- FastMLX is a high performance production ready API to host MLX models.☆297Updated last month
- Benchmarks comparing PyTorch and MLX on Apple Silicon GPUs☆79Updated 9 months ago
- MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.☆147Updated 2 weeks ago
- Train your own small bitnet model☆70Updated 6 months ago
- ☆112Updated 4 months ago
- Explore a simple example of utilizing MLX for RAG application running locally on your Apple Silicon device.☆168Updated last year
- ☆209Updated 3 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆199Updated 9 months ago
- Start a server from the MLX library.☆185Updated 9 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆237Updated 11 months ago
- Port of Andrej Karpathy's nanoGPT to Apple MLX framework.☆105Updated last year
- Low-Rank adapter extraction for fine-tuned transformers models☆173Updated last year
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆149Updated last year
- Train your own SOTA deductive reasoning model☆91Updated 2 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆98Updated 2 months ago