Fast Multimodal LLM on Mobile Devices
☆1,477Apr 12, 2026Updated 2 weeks ago
Alternatives and similar repositories for mllm
Users that are interested in mllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆67Nov 16, 2024Updated last year
- ☆43Mar 29, 2025Updated last year
- One-size-fits-all model for mobile AI, a novel paradigm for mobile AI in which the OS and hardware co-manage a foundation model that is c…☆30Mar 5, 2024Updated 2 years ago
- Survey Paper List - Efficient LLM and Foundation Models☆266Sep 22, 2024Updated last year
- Low-bit LLM inference on CPU/NPU with lookup table☆953Jun 5, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆141Apr 9, 2026Updated 3 weeks ago
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆39Jul 14, 2025Updated 9 months ago
- ☆213Jan 17, 2024Updated 2 years ago
- LLM inference in C/C++☆51Updated this week
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆91Feb 14, 2026Updated 2 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆834Mar 6, 2025Updated last year
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆64Sep 30, 2025Updated 7 months ago
- Let's use Qualcomm NPU in Android☆18Feb 18, 2025Updated last year
- A demo of end-to-end federated learning system.☆69Jun 1, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…☆404Updated this week
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆5,185Apr 20, 2026Updated last week
- Universal LLM Deployment Engine with ML Compilation☆22,517Apr 22, 2026Updated last week
- High-speed and easy-use LLM serving framework for local deployment☆148Aug 7, 2025Updated 8 months ago
- Paper list for Personal LLM Agents☆428May 8, 2024Updated last year
- llm deploy project based mnn. This project has merged into MNN.☆1,614Jan 20, 2025Updated last year
- ☆102Jan 17, 2024Updated 2 years ago
- Awesome Mobile LLMs☆328Apr 6, 2026Updated 3 weeks ago
- Strong and Open Vision Language Assistant for Mobile Devices☆1,350Apr 15, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Sep 22, 2024Updated last year
- Demonstration of running a native LLM on Android device.☆246Apr 12, 2026Updated 2 weeks ago
- Our unique contributions are in tools/train/benchmark.☆22Apr 14, 2025Updated last year
- RROS is a dual-kernel OS for satellites or other scenarios that need both real-time and general-purpose abilities. RROS = RTOS (Rust) + …☆686Jan 3, 2025Updated last year
- On-device AI across mobile, embedded and edge for PyTorch☆4,547Updated this week
- MobiSys#114☆23Aug 17, 2023Updated 2 years ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation☆67Aug 9, 2024Updated last year
- Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) an…☆1,009Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- High-speed Large Language Model Serving for Local Deployment☆9,390Jan 24, 2026Updated 3 months ago
- mnn asr demo.☆26Mar 24, 2025Updated last year
- ☆22Apr 17, 2026Updated last week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,512Jul 17, 2025Updated 9 months ago
- TinyChatEngine: On-Device LLM Inference Library☆953Jul 4, 2024Updated last year
- LLM inference in C/C++☆21Oct 22, 2025Updated 6 months ago
- llm-export can export llm model to onnx.☆350Oct 24, 2025Updated 6 months ago