Fast Multimodal LLM on Mobile Devices
☆1,401Feb 20, 2026Updated last week
Alternatives and similar repositories for mllm
Users that are interested in mllm are comparing it to the libraries listed below
Sorting:
- ☆66Nov 16, 2024Updated last year
- ☆43Mar 29, 2025Updated 11 months ago
- Low-bit LLM inference on CPU/NPU with lookup table☆924Jun 5, 2025Updated 8 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆260Sep 22, 2024Updated last year
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆90Feb 14, 2026Updated 2 weeks ago
- One-size-fits-all model for mobile AI, a novel paradigm for mobile AI in which the OS and hardware co-manage a foundation model that is c…☆30Mar 5, 2024Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆816Mar 6, 2025Updated 11 months ago
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆35Jul 14, 2025Updated 7 months ago
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆48Sep 30, 2025Updated 5 months ago
- Strong and Open Vision Language Assistant for Mobile Devices☆1,334Apr 15, 2024Updated last year
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆5,022Updated this week
- ☆123Feb 12, 2026Updated 2 weeks ago
- ☆212Jan 17, 2024Updated 2 years ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆67Sep 22, 2024Updated last year
- LLM inference in C/C++☆48Feb 21, 2026Updated last week
- Universal LLM Deployment Engine with ML Compilation☆22,061Feb 18, 2026Updated last week
- llm deploy project based mnn. This project has merged into MNN.☆1,614Jan 20, 2025Updated last year
- Demonstration of running a native LLM on Android device.☆226Updated this week
- The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…☆377Feb 13, 2026Updated 2 weeks ago
- High-speed and easy-use LLM serving framework for local deployment☆146Aug 7, 2025Updated 6 months ago
- On-device AI across mobile, embedded and edge for PyTorch☆4,312Updated this week
- Let's use Qualcomm NPU in Android☆18Feb 18, 2025Updated last year
- mnn asr demo.☆25Mar 24, 2025Updated 11 months ago
- Awesome Mobile LLMs☆304Feb 8, 2026Updated 2 weeks ago
- TinyChatEngine: On-Device LLM Inference Library☆942Jul 4, 2024Updated last year
- Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) an…☆925Updated this week
- llm-export can export llm model to onnx.☆344Oct 24, 2025Updated 4 months ago
- Paper list for Personal LLM Agents☆424May 8, 2024Updated last year
- High-speed Large Language Model Serving for Local Deployment☆8,729Jan 24, 2026Updated last month
- ☆102Jan 17, 2024Updated 2 years ago
- ☆11Feb 7, 2026Updated 3 weeks ago
- MobiSys#114☆23Aug 17, 2023Updated 2 years ago
- MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM …☆14,248Feb 16, 2026Updated last week
- a lightweight LLM model inference framework☆747Apr 7, 2024Updated last year
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆125Updated this week
- LLM inference in C/C++☆20Oct 22, 2025Updated 4 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆273Aug 6, 2025Updated 6 months ago
- ☆78May 28, 2023Updated 2 years ago
- row-major matmul optimization☆703Updated this week