Fast Multimodal LLM on Mobile Devices
☆1,437Mar 18, 2026Updated this week
Alternatives and similar repositories for mllm
Users that are interested in mllm are comparing it to the libraries listed below
Sorting:
- ☆67Nov 16, 2024Updated last year
- ☆43Mar 29, 2025Updated 11 months ago
- One-size-fits-all model for mobile AI, a novel paradigm for mobile AI in which the OS and hardware co-manage a foundation model that is c…☆30Mar 5, 2024Updated 2 years ago
- Survey Paper List - Efficient LLM and Foundation Models☆264Sep 22, 2024Updated last year
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆50Sep 30, 2025Updated 5 months ago
- Low-bit LLM inference on CPU/NPU with lookup table☆932Jun 5, 2025Updated 9 months ago
- ☆132Feb 12, 2026Updated last month
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆37Jul 14, 2025Updated 8 months ago
- ☆212Jan 17, 2024Updated 2 years ago
- LLM inference in C/C++☆48Mar 14, 2026Updated last week
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆90Feb 14, 2026Updated last month
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆818Mar 6, 2025Updated last year
- Let's use Qualcomm NPU in Android☆17Feb 18, 2025Updated last year
- The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…☆387Mar 13, 2026Updated last week
- A demo of end-to-end federated learning system.☆69Jun 1, 2022Updated 3 years ago
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆5,062Updated this week
- Universal LLM Deployment Engine with ML Compilation☆22,246Updated this week
- Paper list for Personal LLM Agents☆426May 8, 2024Updated last year
- High-speed and easy-use LLM serving framework for local deployment☆145Aug 7, 2025Updated 7 months ago
- llm deploy project based mnn. This project has merged into MNN.☆1,617Jan 20, 2025Updated last year
- ☆102Jan 17, 2024Updated 2 years ago
- Awesome Mobile LLMs☆313Updated this week
- Strong and Open Vision Language Assistant for Mobile Devices☆1,345Apr 15, 2024Updated last year
- ☆13Feb 7, 2026Updated last month
- Demonstration of running a native LLM on Android device.☆236Mar 14, 2026Updated last week
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆66Sep 22, 2024Updated last year
- RROS is a dual-kernel OS for satellites or other scenarios that need both real-time and general-purpose abilities. RROS = RTOS (Rust) + …☆680Jan 3, 2025Updated last year
- Our unique contributions are in tools/train/benchmark.☆21Apr 14, 2025Updated 11 months ago
- On-device AI across mobile, embedded and edge for PyTorch☆4,386Updated this week
- MobiSys#114☆23Aug 17, 2023Updated 2 years ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation☆67Aug 9, 2024Updated last year
- Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) an…☆946Updated this week
- High-speed Large Language Model Serving for Local Deployment☆8,834Jan 24, 2026Updated last month
- mnn asr demo.☆25Mar 24, 2025Updated 11 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,463Jul 17, 2025Updated 8 months ago
- TinyChatEngine: On-Device LLM Inference Library☆945Jul 4, 2024Updated last year
- LLM inference in C/C++☆20Oct 22, 2025Updated 4 months ago
- MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.☆14,533Mar 13, 2026Updated last week