stevelaskaridis / awesome-mobile-llmLinks
Awesome Mobile LLMs
☆246Updated last week
Alternatives and similar repositories for awesome-mobile-llm
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
Sorting:
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆67Updated last year
- High-speed and easy-use LLM serving framework for local deployment☆119Updated last month
- Fast Multimodal LLM on Mobile Devices☆1,060Updated this week
- ☆63Updated 10 months ago
- TinyChatEngine: On-Device LLM Inference Library☆896Updated last year
- Awesome list for LLM quantization☆309Updated this week
- ☆217Updated 8 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated last year
- scalable and robust tree-based speculative decoding algorithm☆358Updated 7 months ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆195Updated 8 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆338Updated 4 months ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆81Updated last week
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆234Updated 10 months ago
- ☆38Updated 5 months ago
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.☆638Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆162Updated this week
- The homepage of OneBit model quantization framework.☆192Updated 7 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆302Updated 4 months ago
- KV cache compression for high-throughput LLM inference☆138Updated 7 months ago
- ☆97Updated 11 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆445Updated 8 months ago
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆243Updated 3 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆658Updated 5 months ago
- Compressing Large Language Models using Low Precision and Low Rank Decomposition☆99Updated 9 months ago
- LLM Inference on consumer devices☆124Updated 6 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆328Updated 7 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆194Updated 3 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆223Updated last year
- Efficient LLM Inference over Long Sequences☆391Updated 3 months ago
- ☆558Updated 10 months ago