stevelaskaridis / awesome-mobile-llmLinks
Awesome Mobile LLMs
☆241Updated last month
Alternatives and similar repositories for awesome-mobile-llm
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
Sorting:
- TinyChatEngine: On-Device LLM Inference Library☆889Updated last year
- Fast Multimodal LLM on Mobile Devices☆1,024Updated this week
- ☆62Updated 9 months ago
- High-speed and easy-use LLM serving framework for local deployment☆118Updated 3 weeks ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆67Updated 11 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated last year
- Awesome list for LLM quantization☆291Updated this week
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆193Updated 7 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆442Updated 7 months ago
- scalable and robust tree-based speculative decoding algorithm☆355Updated 7 months ago
- a curated list of high-quality papers on resource-efficient LLMs 🌱☆134Updated 5 months ago
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆611Updated this week
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆653Updated 4 months ago
- ☆217Updated 7 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆221Updated last year
- ☆38Updated 5 months ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆226Updated 9 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆298Updated 3 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆224Updated 7 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆331Updated 4 months ago
- ☆96Updated 11 months ago
- LLM-Inference-Bench☆50Updated last month
- Low-bit LLM inference on CPU/NPU with lookup table☆845Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 10 months ago
- LLM Inference on consumer devices☆124Updated 5 months ago
- On-device LLM Inference Powered by X-Bit Quantization☆267Updated 3 weeks ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆318Updated 6 months ago
- ☆554Updated 10 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆874Updated 2 weeks ago
- Open-source calculator for LLM system requirements.☆167Updated 8 months ago