stevelaskaridis / awesome-mobile-llmLinks
Awesome Mobile LLMs
☆301Updated 2 months ago
Alternatives and similar repositories for awesome-mobile-llm
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
Sorting:
- High-speed and easy-use LLM serving framework for local deployment☆145Updated 5 months ago
- Fast Multimodal LLM on Mobile Devices☆1,370Updated this week
- TinyChatEngine: On-Device LLM Inference Library☆941Updated last year
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Updated last year
- 1.58 Bit LLM on Apple Silicon using MLX☆242Updated last year
- On-device LLM Inference Powered by X-Bit Quantization☆278Updated last week
- 🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantiza…☆839Updated this week
- ☆116Updated this week
- ☆65Updated last year
- For releasing code related to compression methods for transformers, accompanying our publications☆455Updated last year
- ☆42Updated 10 months ago
- Awesome list for LLM quantization☆384Updated 3 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆674Updated 9 months ago
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,406Updated 9 months ago
- ☆219Updated last year
- Low-bit LLM inference on CPU/NPU with lookup table☆915Updated 7 months ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆198Updated last year
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆373Updated 11 months ago
- ☆576Updated last year
- scalable and robust tree-based speculative decoding algorithm☆366Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆355Updated last week
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆113Updated this week
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆258Updated last year
- ☆786Updated this week
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆262Updated 8 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆229Updated last year
- An innovative library for efficient LLM inference via low-bit quantization☆352Updated last year
- Efficient LLM Inference over Long Sequences☆394Updated 7 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆327Updated 4 months ago
- The homepage of OneBit model quantization framework.☆200Updated 11 months ago