High-speed and easy-use LLM serving framework for local deployment
☆145Aug 7, 2025Updated 7 months ago
Alternatives and similar repositories for PowerServe
Users that are interested in PowerServe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Bamboo-7B Large Language Model☆93Mar 28, 2024Updated last year
- ☆63Dec 16, 2025Updated 3 months ago
- ☆74Oct 6, 2023Updated 2 years ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆91Feb 14, 2026Updated last month
- Run Chinese MobileBert model on SNPE.☆14May 19, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆67Sep 22, 2024Updated last year
- The project now is moved to github.com/SJTU-IPADS/ServerlessBench. An open-sourced benchmark suite for serverless computing☆22May 20, 2022Updated 3 years ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- Mic-controlled mouse clicks☆17Oct 6, 2025Updated 5 months ago
- ☆48Jul 30, 2025Updated 7 months ago
- Fast Multimodal LLM on Mobile Devices☆1,437Mar 18, 2026Updated last week
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆48Mar 12, 2026Updated 2 weeks ago
- An fully autonomous agent that accesses the browser and performs tasks.☆18Apr 25, 2025Updated 11 months ago
- High-speed Large Language Model Serving for Local Deployment☆9,060Jan 24, 2026Updated 2 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆124Mar 18, 2026Updated last week
- Example apps for LeapSDK☆59Mar 12, 2026Updated 2 weeks ago
- Study materials collected while studying☆51Apr 16, 2022Updated 3 years ago
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆142Mar 18, 2026Updated last week
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.☆38Jul 2, 2025Updated 8 months ago
- Project is intended to build and deploy an scene detection application onto Qualcomm Robotics development Kit (RB5) that detects whether …☆10Jun 26, 2022Updated 3 years ago
- Create text chunks which end at natural stopping points without using a tokenizer☆26Nov 26, 2025Updated 4 months ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- A Triton JIT runtime and ffi provider in C++☆32Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- SJTU SE3357 操作系统笔记 OS Notes☆19Jun 4, 2023Updated 2 years ago
- ☆21Oct 2, 2024Updated last year
- ☆43Mar 29, 2025Updated 11 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆34Aug 14, 2024Updated last year
- A simple agent powered by LLMs that performs tasks.☆14Apr 25, 2025Updated 11 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆31May 1, 2025Updated 10 months ago
- A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.☆23Oct 11, 2024Updated last year
- A powerful and user-friendly tool that generates detailed captions for your images☆21Nov 11, 2024Updated last year
- KV cache compression for high-throughput LLM inference☆153Feb 5, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- (施工中...👷)上海交通大学课程“互联网应用开发技术(SE2321)” 的前端 demo 项目,供同学们学习参考。☆89May 19, 2025Updated 10 months ago
- Low-bit LLM inference on CPU/NPU with lookup table☆941Jun 5, 2025Updated 9 months ago
- MobiSys#114☆23Aug 17, 2023Updated 2 years ago
- a single-header math library☆17Nov 7, 2025Updated 4 months ago
- The artifact for NDSS '25 paper "ASGARD: Protecting On-Device Deep Neural Networks with Virtualization-Based Trusted Execution Environmen…☆15Oct 16, 2025Updated 5 months ago
- AirLLM 70B inference with single 4GB GPU☆20Jun 27, 2025Updated 8 months ago
- ☆184Jan 22, 2026Updated 2 months ago