stevelaskaridis / awesome-mobile-llmLinks
Awesome Mobile LLMs
β288Updated last month
Alternatives and similar repositories for awesome-mobile-llm
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
Sorting:
- β66Updated last year
- π―Accuracy-first quantization toolkit for LLMs, focusing on minimizing quality degradation across Weight Only Quantization, MXFP4, NVFP4,β¦β793Updated this week
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Modelsβ68Updated last year
- TinyChatEngine: On-Device LLM Inference Libraryβ936Updated last year
- Fast Multimodal LLM on Mobile Devicesβ1,320Updated this week
- High-speed and easy-use LLM serving framework for local deploymentβ141Updated 5 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β367Updated 10 months ago
- The homepage of OneBit model quantization framework.β197Updated 11 months ago
- An innovative library for efficient LLM inference via low-bit quantizationβ351Updated last year
- [ICLR-2025-SLLM Spotlight π₯]MobiLlama : Small Language Model tailored for edge devicesβ669Updated 8 months ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.β348Updated 8 months ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..β198Updated 11 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β351Updated 8 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Modelsβ323Updated last month
- Awesome list for LLM quantizationβ375Updated 3 months ago
- LLM Inference on consumer devicesβ128Updated 9 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ671Updated 8 months ago
- β41Updated 9 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"β116Updated 2 months ago
- 1.58 Bit LLM on Apple Silicon using MLXβ234Updated last year
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiaiβ111Updated 2 weeks ago
- A family of compressed models obtained via pruning and knowledge distillationβ362Updated 2 months ago
- For releasing code related to compression methods for transformers, accompanying our publicationsβ454Updated 11 months ago
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β325Updated 3 months ago
- On-device LLM Inference Powered by X-Bit Quantizationβ274Updated last week
- β219Updated 11 months ago
- Low-bit LLM inference on CPU/NPU with lookup tableβ906Updated 7 months ago
- β576Updated last year
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ229Updated last year
- Efficient LLM Inference over Long Sequencesβ394Updated 6 months ago