Fast Multimodal LLM on Mobile Devices
☆1,508Apr 30, 2026Updated 3 weeks ago
Alternatives and similar repositories for mllm
Users that are interested in mllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆67Nov 16, 2024Updated last year
- ☆43Mar 29, 2025Updated last year
- One-size-fits-all model for mobile AI, a novel paradigm for mobile AI in which the OS and hardware co-manage a foundation model that is c…☆30Mar 5, 2024Updated 2 years ago
- Survey Paper List - Efficient LLM and Foundation Models☆266Sep 22, 2024Updated last year
- Low-bit LLM inference on CPU/NPU with lookup table☆955Jun 5, 2025Updated 11 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆151May 3, 2026Updated 2 weeks ago
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆42Jul 14, 2025Updated 10 months ago
- ☆215Jan 17, 2024Updated 2 years ago
- LLM inference in C/C++☆52Updated this week
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆91Updated this week
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆838Mar 6, 2025Updated last year
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆68Sep 30, 2025Updated 7 months ago
- Let's use Qualcomm NPU in Android☆20Feb 18, 2025Updated last year
- A demo of end-to-end federated learning system.☆69Jun 1, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆5,229Apr 20, 2026Updated last month
- Universal LLM Deployment Engine with ML Compilation☆22,633May 11, 2026Updated last week
- The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…☆409Updated this week
- Paper list for Personal LLM Agents☆428May 8, 2024Updated 2 years ago
- High-speed and easy-use LLM serving framework for local deployment☆153Aug 7, 2025Updated 9 months ago
- llm deploy project based mnn. This project has merged into MNN.☆1,616Jan 20, 2025Updated last year
- ☆102Jan 17, 2024Updated 2 years ago
- Awesome Mobile LLMs☆343May 1, 2026Updated 2 weeks ago
- Strong and Open Vision Language Assistant for Mobile Devices☆1,353Apr 15, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Sep 22, 2024Updated last year
- Demonstration of running a native LLM on Android device.☆249May 14, 2026Updated last week
- Our unique contributions are in tools/train/benchmark.☆22Apr 14, 2025Updated last year
- RROS is a dual-kernel OS for satellites or other scenarios that need both real-time and general-purpose abilities. RROS = RTOS (Rust) + …☆686Jan 3, 2025Updated last year
- On-device AI across mobile, embedded and edge for PyTorch☆4,622Updated this week
- MobiSys#114☆23Aug 17, 2023Updated 2 years ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation☆69Aug 9, 2024Updated last year
- Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) an…☆1,045Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- High-speed Large Language Model Serving for Local Deployment☆9,469May 11, 2026Updated last week
- mnn asr demo.☆27Mar 24, 2025Updated last year
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,536Jul 17, 2025Updated 10 months ago
- TinyChatEngine: On-Device LLM Inference Library☆952Jul 4, 2024Updated last year
- LLM inference in C/C++☆21Oct 22, 2025Updated 6 months ago
- llm-export can export llm model to onnx.☆352May 8, 2026Updated last week
- MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.☆15,169May 12, 2026Updated last week