High-speed and easy-use LLM serving framework for local deployment
☆147Aug 7, 2025Updated 8 months ago
Alternatives and similar repositories for PowerServe
Users that are interested in PowerServe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆61Sep 30, 2025Updated 6 months ago
- ☆74Oct 6, 2023Updated 2 years ago
- YOLOv5在高通AI Engine Direct环境下进行QNN量化,CPU推理的项目☆16Sep 10, 2024Updated last year
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆91Feb 14, 2026Updated 2 months ago
- Run Chinese MobileBert model on SNPE.☆15May 19, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆67Sep 22, 2024Updated last year
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- Mic-controlled mouse clicks☆17Oct 6, 2025Updated 6 months ago
- Fast Multimodal LLM on Mobile Devices☆1,463Mar 29, 2026Updated 2 weeks ago
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆49Apr 1, 2026Updated 2 weeks ago
- ☆18Updated this week
- An fully autonomous agent that accesses the browser and performs tasks.☆18Apr 25, 2025Updated 11 months ago
- High-speed Large Language Model Serving for Local Deployment☆9,324Jan 24, 2026Updated 2 months ago
- Milk-V Duo. Access to Internet throw USB RNDIS connection to host machine☆16Jan 11, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Study materials collected while studying☆51Apr 16, 2022Updated 4 years ago
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆147Updated this week
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.☆38Jul 2, 2025Updated 9 months ago
- Project is intended to build and deploy an scene detection application onto Qualcomm Robotics development Kit (RB5) that detects whether …☆10Jun 26, 2022Updated 3 years ago
- Create text chunks which end at natural stopping points without using a tokenizer☆26Nov 26, 2025Updated 4 months ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- A Triton JIT runtime and ffi provider in C++☆32Updated this week
- ☆21Oct 2, 2024Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆34Aug 14, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆43Mar 29, 2025Updated last year
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆31May 1, 2025Updated 11 months ago
- A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.☆23Oct 11, 2024Updated last year
- Experimental interface environment for open source LLM, designed to democratize the use of AI. Powered by llama-cpp, llama-cpp-python and…☆18Oct 11, 2025Updated 6 months ago
- A powerful and user-friendly tool that generates detailed captions for your images☆21Nov 11, 2024Updated last year
- ☆13Jan 7, 2025Updated last year
- KV cache compression for high-throughput LLM inference☆155Feb 5, 2025Updated last year
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆38Jul 14, 2025Updated 9 months ago
- Low-bit LLM inference on CPU/NPU with lookup table☆946Jun 5, 2025Updated 10 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆18Jan 27, 2025Updated last year
- MobiSys#114☆23Aug 17, 2023Updated 2 years ago
- a single-header math library☆17Nov 7, 2025Updated 5 months ago
- AirLLM 70B inference with single 4GB GPU☆20Jun 27, 2025Updated 9 months ago
- ☆187Jan 22, 2026Updated 2 months ago
- Personal voice assistant, with voice interruption and Twilio support☆18Feb 24, 2025Updated last year
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year