High-speed and easy-use LLM serving framework for local deployment
☆155Aug 7, 2025Updated 10 months ago
Alternatives and similar repositories for PowerServe
Users that are interested in PowerServe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Bamboo-7B Large Language Model☆94Mar 28, 2024Updated 2 years ago
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆70Sep 30, 2025Updated 8 months ago
- ☆83Dec 16, 2025Updated 6 months ago
- YOLOv5在高通AI Engine Direct环境下进行QNN量化,CPU推理的项目☆17Sep 10, 2024Updated last year
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆93Jun 8, 2026Updated last week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Run Chinese MobileBert model on SNPE.☆15May 19, 2023Updated 3 years ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Sep 22, 2024Updated last year
- The project now is moved to github.com/SJTU-IPADS/ServerlessBench. An open-sourced benchmark suite for serverless computing☆22May 20, 2022Updated 4 years ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- LLM inference in C/C++☆52Jun 9, 2026Updated last week
- Mic-controlled mouse clicks☆17Oct 6, 2025Updated 8 months ago
- Fast Multimodal LLM on Mobile Devices☆1,540Jun 9, 2026Updated last week
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆49Updated this week
- ☆12Sep 22, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- High-speed Large Language Model Serving for Local Deployment☆9,548May 11, 2026Updated last month
- Milk-V Duo. Access to Internet throw USB RNDIS connection to host machine☆16Jan 11, 2024Updated 2 years ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆159Updated this week
- Create text chunks which end at natural stopping points without using a tokenizer☆26Nov 26, 2025Updated 6 months ago
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.☆38Jul 2, 2025Updated 11 months ago
- Project is intended to build and deploy an scene detection application onto Qualcomm Robotics development Kit (RB5) that detects whether …☆10Jun 26, 2022Updated 3 years ago
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆173Updated this week
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- A Triton JIT runtime and ffi provider in C++☆35May 27, 2026Updated 2 weeks ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- SJTU SE3357 操作系统笔记 OS Notes☆17Jun 4, 2023Updated 3 years ago
- LLM Inference on consumer devices☆131Mar 17, 2025Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆37Aug 14, 2024Updated last year
- ☆43Mar 29, 2025Updated last year
- A simple agent powered by LLMs that performs tasks.☆15Apr 25, 2025Updated last year
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆32May 1, 2025Updated last year
- A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.☆23Oct 11, 2024Updated last year
- Experimental interface environment for open source LLM, designed to democratize the use of AI. Powered by llama-cpp, llama-cpp-python and…☆18Oct 11, 2025Updated 8 months ago
- ☆13Jan 7, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- KV cache compression for high-throughput LLM inference☆157Feb 5, 2025Updated last year
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆45Updated this week
- matmul using AMX instructions☆24May 7, 2024Updated 2 years ago
- ☆19Jan 27, 2025Updated last year
- Low-bit LLM inference on CPU/NPU with lookup table☆965Jun 5, 2025Updated last year
- MobiSys#114☆23Aug 17, 2023Updated 2 years ago
- a single-header math library☆17Nov 7, 2025Updated 7 months ago