stevelaskaridis / awesome-mobile-llmLinks

Awesome Mobile LLMs

☆272

Alternatives and similar repositories for awesome-mobile-llm

Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below

Sorting:

powerserve-project / PowerServe
High-speed and easy-use LLM serving framework for local deployment
☆137Updated 3 months ago
UbiquitousLearning / mllm
Fast Multimodal LLM on Mobile Devices
☆1,200Updated this week
mit-han-lab / TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
☆924Updated last year
UbiquitousLearning / PhoneLM
☆64Updated last year
intel / auto-round
Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with …
☆724Updated this week
saic-fi / MobileQuant
[EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models
☆68Updated last year
facebookresearch / SpinQuant
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
☆347Updated 9 months ago
Aaronhuang-778 / BiLLM
[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
☆227Updated 10 months ago
Macaronlin / LLaMA3-Quantization
A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..
☆197Updated 10 months ago
Picovoice / picollm
On-device LLM Inference Powered by X-Bit Quantization
☆273Updated last week
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆347Updated 6 months ago
pprp / Awesome-LLM-Quantization
Awesome list for LLM quantization
☆359Updated last month
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆350Updated last year
OpenGVLab / EfficientQAT
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆312Updated 6 months ago
apple / ml-recurrent-drafter
☆218Updated 10 months ago
microsoft / T-MAC
Low-bit LLM inference on CPU/NPU with lookup table
☆895Updated 5 months ago
exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆225Updated last year
microsoft / VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆668Updated 7 months ago
microsoft / TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
☆451Updated 10 months ago
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆363Updated 10 months ago
NVlabs / Minitron
A family of compressed models obtained via pruning and knowledge distillation
☆356Updated 3 weeks ago
xuyuzhuang11 / OneBit
The homepage of OneBit model quantization framework.
☆196Updated 9 months ago
dropbox / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆893Updated last month
google-ai-edge / LiteRT-LM
☆491Updated this week
efeslab / fiddler
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆243Updated last year
pilancilab / caldera
Compressing Large Language Models using Low Precision and Low Rank Decomposition
☆105Updated 11 months ago
facebookresearch / ParetoQ
This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"
☆113Updated last month
Cornell-RelaxML / quip-sharp
☆569Updated last year
IntelLabs / Hardware-Aware-Automated-Machine-Learning
☆71Updated 4 months ago
lx200916 / ChatBotApp
☆41Updated 8 months ago