stevelaskaridis / awesome-mobile-llmLinks
Awesome Mobile LLMs
☆226Updated last week
Alternatives and similar repositories for awesome-mobile-llm
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
Sorting:
- Fast Multimodal LLM on Mobile Devices☆983Updated this week
- High-speed and easy-use LLM serving framework for local deployment☆115Updated 4 months ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆66Updated 10 months ago
- ☆59Updated 8 months ago
- TinyChatEngine: On-Device LLM Inference Library☆882Updated last year
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆648Updated 3 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆324Updated 3 months ago
- ☆215Updated 6 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆221Updated 6 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 11 months ago
- 🤗 Optimum ExecuTorch☆58Updated last week
- [ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices☆653Updated 2 months ago
- scalable and robust tree-based speculative decoding algorithm☆354Updated 6 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆437Updated 6 months ago
- The homepage of OneBit model quantization framework.☆185Updated 6 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆289Updated 2 months ago
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆564Updated last week
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆192Updated 6 months ago
- Awesome list for LLM quantization☆260Updated last month
- 1.58 Bit LLM on Apple Silicon using MLX☆217Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆225Updated 8 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆347Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 9 months ago
- Efficient LLM Inference over Long Sequences☆385Updated last month
- ☆549Updated 9 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆307Updated 5 months ago
- Low-bit LLM inference on CPU/NPU with lookup table☆836Updated 2 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆172Updated 2 months ago
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,315Updated 3 months ago
- A collection of all available inference solutions for the LLMs☆91Updated 5 months ago