High-speed and easy-use LLM serving framework for local deployment
☆153Aug 7, 2025Updated 9 months ago
Alternatives and similar repositories for PowerServe
Users that are interested in PowerServe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Bamboo-7B Large Language Model☆94Mar 28, 2024Updated 2 years ago
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆69Sep 30, 2025Updated 7 months ago
- ☆76Oct 6, 2023Updated 2 years ago
- YOLOv5在高通AI Engine Direct环境下进行QNN量化,CPU推理的项目☆17Sep 10, 2024Updated last year
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆91May 17, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Run Chinese MobileBert model on SNPE.☆15May 19, 2023Updated 3 years ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Sep 22, 2024Updated last year
- The project now is moved to github.com/SJTU-IPADS/ServerlessBench. An open-sourced benchmark suite for serverless computing☆22May 20, 2022Updated 4 years ago
- LLM inference in C/C++☆52Updated this week
- Mic-controlled mouse clicks☆17Oct 6, 2025Updated 7 months ago
- Fast Multimodal LLM on Mobile Devices☆1,515Apr 30, 2026Updated 3 weeks ago
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆49May 18, 2026Updated last week
- ☆27Updated this week
- High-speed Large Language Model Serving for Local Deployment☆9,469May 11, 2026Updated 2 weeks ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆149Updated this week
- Create text chunks which end at natural stopping points without using a tokenizer☆26Nov 26, 2025Updated 6 months ago
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.☆38Jul 2, 2025Updated 10 months ago
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆161Updated this week
- LLM Inference on consumer devices☆132Mar 17, 2025Updated last year
- A Triton JIT runtime and ffi provider in C++☆35Updated this week
- ☆21Oct 2, 2024Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆37Aug 14, 2024Updated last year
- ☆43Mar 29, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A simple agent powered by LLMs that performs tasks.☆15Apr 25, 2025Updated last year
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆31May 1, 2025Updated last year
- A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.☆23Oct 11, 2024Updated last year
- A powerful and user-friendly tool that generates detailed captions for your images☆21Nov 11, 2024Updated last year
- Experimental interface environment for open source LLM, designed to democratize the use of AI. Powered by llama-cpp, llama-cpp-python and…☆18Oct 11, 2025Updated 7 months ago
- ☆13Jan 7, 2025Updated last year
- A note taking app based around GPUI. gpui-notes takes inspiration from note apps like Obsidian and Trilium.☆14Feb 8, 2024Updated 2 years ago
- KV cache compression for high-throughput LLM inference☆157Feb 5, 2025Updated last year
- Low-bit LLM inference on CPU/NPU with lookup table☆957Jun 5, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆19Jan 27, 2025Updated last year
- MobiSys#114☆23Aug 17, 2023Updated 2 years ago
- The artifact for NDSS '25 paper "ASGARD: Protecting On-Device Deep Neural Networks with Virtualization-Based Trusted Execution Environmen…☆15Oct 16, 2025Updated 7 months ago
- The objective of this repository is to create an android application which would have the deep model trained and converted to the Qualcom…☆20Jun 12, 2019Updated 6 years ago
- AirLLM 70B inference with single 4GB GPU☆20Jun 27, 2025Updated 10 months ago
- Personal voice assistant, with voice interruption and Twilio support☆18Feb 24, 2025Updated last year
- ☆191Apr 24, 2026Updated last month