smpanaro / apple-silicon-4bit-quantLinks
Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"
β11Updated last year
Alternatives and similar repositories for apple-silicon-4bit-quant
Users that are interested in apple-silicon-4bit-quant are comparing it to the libraries listed below
Sorting:
- ModernBERT model optimized for Apple Neural Engine.β29Updated 11 months ago
- Profile your CoreML models directly from Python πβ29Updated 3 months ago
- Find out why your CoreML model isn't running on the Neural Engine!β28Updated last year
- SmolVLM2 Demoβ180Updated 9 months ago
- 1.58 Bit LLM on Apple Silicon using MLXβ233Updated last year
- QuIP quantizationβ61Updated last year
- Implementation of nougat that focuses on processing pdf locally.β83Updated 11 months ago
- C API for MLXβ157Updated last week
- Distributed Inference for mlx LLmβ99Updated last year
- KAN (KolmogorovβArnold Networks) in the MLX framework for Apple Siliconβ31Updated 6 months ago
- β66Updated 6 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language modelsβ104Updated 7 months ago
- β219Updated 11 months ago
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMsβ110Updated last year
- A simple MLX implementation for pretraining LLMs on Apple Silicon.β84Updated 4 months ago
- mlx image models for Apple Silicon machinesβ88Updated last month
- 1.58-bit LLaMa modelβ83Updated last year
- RWKV-7: Surpassing GPTβ102Updated last year
- Inference of Mamba models in pure Cβ196Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"β155Updated last year
- β52Updated last year
- β51Updated last year
- mlx implementations of various transformers, speedups, trainingβ33Updated 2 years ago
- A collection of optimizers for MLXβ54Updated 2 weeks ago
- MLX Transformers is a library that provides model implementation in MLX. It uses a similar model interface as HuggingFace Transformers anβ¦β69Updated last year
- β68Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β37Updated 2 months ago
- Samples of good AI generated CUDA kernelsβ95Updated 7 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)β62Updated last year
- FlashAttention (Metal Port)β569Updated last year