stevelaskaridis / awesome-mobile-llmLinks
Awesome Mobile LLMs
☆282Updated 3 weeks ago
Alternatives and similar repositories for awesome-mobile-llm
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
Sorting:
- High-speed and easy-use LLM serving framework for local deployment☆139Updated 4 months ago
- Awesome list for LLM quantization☆370Updated 2 months ago
- Fast Multimodal LLM on Mobile Devices☆1,277Updated last week
- ☆65Updated last year
- TinyChatEngine: On-Device LLM Inference Library☆932Updated last year
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Updated last year
- Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Tra…☆764Updated this week
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆670Updated 7 months ago
- On-device LLM Inference Powered by X-Bit Quantization☆273Updated last week
- LLM-Inference-Bench☆56Updated 5 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆356Updated 10 months ago
- scalable and robust tree-based speculative decoding algorithm☆365Updated 10 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆349Updated 7 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆317Updated 3 weeks ago
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆249Updated last year
- The homepage of OneBit model quantization framework.☆196Updated 10 months ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆197Updated 11 months ago
- ☆219Updated 10 months ago
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆254Updated 6 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆116Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 2 weeks ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆220Updated this week
- ☆41Updated 8 months ago
- LLM Inference on consumer devices☆128Updated 9 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆361Updated last month
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆106Updated last week
- ☆101Updated 3 weeks ago
- Awesome LLMs on Device: A Comprehensive Survey☆1,288Updated 11 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆452Updated 11 months ago