stevelaskaridis / awesome-mobile-llm
Awesome Mobile LLMs
☆184Updated last month
Alternatives and similar repositories for awesome-mobile-llm
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
Sorting:
- Fast Multimodal LLM on Mobile Devices☆849Updated last month
- ☆209Updated 3 months ago
- High-speed and easy-use LLM serving framework for local deployment☆103Updated last month
- Awesome list for LLM quantization☆213Updated 4 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆272Updated 3 months ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆209Updated 5 months ago
- scalable and robust tree-based speculative decoding algorithm☆345Updated 3 months ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆56Updated 7 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆427Updated 3 months ago
- A family of compressed models obtained via pruning and knowledge distillation☆336Updated 6 months ago
- Advanced Quantization Algorithm for LLMs/VLMs.☆460Updated this week
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆294Updated last week
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆216Updated 4 months ago
- KV cache compression for high-throughput LLM inference☆126Updated 3 months ago
- ☆56Updated 5 months ago
- ☆54Updated 2 weeks ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆266Updated 7 months ago
- ☆131Updated last month
- MobiLlama : Small Language Model tailored for edge devices☆636Updated this week
- Awesome LLMs on Device: A Comprehensive Survey☆1,093Updated 4 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆295Updated 3 months ago
- Fast low-bit matmul kernels in Triton☆299Updated this week
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆357Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆263Updated 7 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆350Updated 8 months ago
- ☆532Updated 6 months ago
- The homepage of OneBit model quantization framework.☆176Updated 3 months ago
- On-device LLM Inference Powered by X-Bit Quantization☆238Updated this week
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆352Updated 9 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆199Updated 9 months ago