stevelaskaridis / awesome-mobile-llmLinks
Awesome Mobile LLMs
☆204Updated 3 weeks ago
Alternatives and similar repositories for awesome-mobile-llm
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
Sorting:
- ☆57Updated 7 months ago
- Fast Multimodal LLM on Mobile Devices☆929Updated 2 weeks ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆63Updated 9 months ago
- High-speed and easy-use LLM serving framework for local deployment☆112Updated 3 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆311Updated last month
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆273Updated last month
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆288Updated 4 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆343Updated 7 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆646Updated 2 months ago
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆525Updated this week
- llama.cpp tutorial on Android phone☆110Updated last month
- On-device LLM Inference Powered by X-Bit Quantization☆249Updated 2 weeks ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 9 months ago
- TinyChatEngine: On-Device LLM Inference Library☆865Updated 11 months ago
- [ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices☆648Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆264Updated 8 months ago
- LLM Inference on consumer devices☆119Updated 3 months ago
- ☆35Updated 2 months ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆213Updated 7 months ago
- The homepage of OneBit model quantization framework.☆181Updated 4 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆359Updated 10 months ago
- ☆213Updated 5 months ago
- Awesome list for LLM quantization☆238Updated 2 weeks ago
- Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLa…☆633Updated this week
- Simple extension on vLLM to help you speed up reasoning model without training.☆161Updated 3 weeks ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆708Updated 3 months ago
- scalable and robust tree-based speculative decoding algorithm☆348Updated 5 months ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆487Updated 9 months ago
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,303Updated 2 months ago
- ☆95Updated 8 months ago