stevelaskaridis / awesome-mobile-llmLinks
Awesome Mobile LLMs
☆253Updated 2 weeks ago
Alternatives and similar repositories for awesome-mobile-llm
Users that are interested in awesome-mobile-llm are comparing it to the libraries listed below
Sorting:
- ☆63Updated 11 months ago
- Fast Multimodal LLM on Mobile Devices☆1,105Updated last week
- High-speed and easy-use LLM serving framework for local deployment☆124Updated 2 months ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Updated last year
- TinyChatEngine: On-Device LLM Inference Library☆903Updated last year
- 1.58 Bit LLM on Apple Silicon using MLX☆224Updated last year
- On-device LLM Inference Powered by X-Bit Quantization☆269Updated 2 months ago
- Awesome list for LLM quantization☆318Updated last week
- For releasing code related to compression methods for transformers, accompanying our publications☆446Updated 9 months ago
- ☆218Updated 8 months ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆88Updated last week
- 🤗 Optimum ExecuTorch☆69Updated last week
- The homepage of OneBit model quantization framework.☆193Updated 8 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆659Updated 5 months ago
- scalable and robust tree-based speculative decoding algorithm☆359Updated 8 months ago
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.☆659Updated this week
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆306Updated 4 months ago
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆195Updated 9 months ago
- LLM-Inference-Bench☆55Updated 3 months ago
- [EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!☆451Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆237Updated 11 months ago
- [ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices☆663Updated 5 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆228Updated 9 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆343Updated 5 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆335Updated 8 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆171Updated this week
- ☆166Updated last week
- ☆39Updated 6 months ago
- ☆69Updated 2 months ago