High-speed and easy-use LLM serving framework for local deployment
☆148Aug 7, 2025Updated 6 months ago
Alternatives and similar repositories for PowerServe
Users that are interested in PowerServe are comparing it to the libraries listed below
Sorting:
- Bamboo-7B Large Language Model☆93Mar 28, 2024Updated last year
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆49Sep 30, 2025Updated 5 months ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆67Sep 22, 2024Updated last year
- YOLOv5在高通AI Engine Direct环境下进行QNN量化,CPU推理的项目☆16Sep 10, 2024Updated last year
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆90Feb 14, 2026Updated 2 weeks ago
- Run Chinese MobileBert model on SNPE.☆15May 19, 2023Updated 2 years ago
- LLM inference in C/C++☆48Updated this week
- An fully autonomous agent that accesses the browser and performs tasks.☆17Apr 25, 2025Updated 10 months ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- Fast Multimodal LLM on Mobile Devices☆1,412Updated this week
- A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.☆23Oct 11, 2024Updated last year
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- ☆13Jan 7, 2025Updated last year
- ☆11Feb 7, 2026Updated 3 weeks ago
- Project is intended to build and deploy an scene detection application onto Qualcomm Robotics development Kit (RB5) that detects whether …☆10Jun 26, 2022Updated 3 years ago
- A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp☆16Feb 10, 2026Updated 3 weeks ago
- ☆15Apr 9, 2025Updated 10 months ago
- Create text chunks which end at natural stopping points without using a tokenizer☆26Nov 26, 2025Updated 3 months ago
- OpenAI compatible API for open source LLMs☆16Oct 30, 2023Updated 2 years ago
- An interface that features barely zero external dependencies beyond the Ollama API itself, making it lightweight and portable to easily i…☆12Mar 25, 2025Updated 11 months ago
- Visual Tagger is a JavaScript tool that visually highlights HTML elements for AIs, aiding in identifying interactive components on web pa…☆11Oct 28, 2024Updated last year
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆137Updated this week
- The project now is moved to github.com/SJTU-IPADS/ServerlessBench. An open-sourced benchmark suite for serverless computing☆22May 20, 2022Updated 3 years ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- (MacOS Support) OpenAI compatible http server for Spark-TTS☆15May 1, 2025Updated 10 months ago
- LLamaHTML is a simple html file to communicate with a running llamacpp llama-server☆22Aug 5, 2025Updated 7 months ago
- AirLLM 70B inference with single 4GB GPU☆17Jun 27, 2025Updated 8 months ago
- Open source Speechify alternative. Read PDFs and EPUBs with local models.☆37Nov 14, 2025Updated 3 months ago
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.☆36Jul 2, 2025Updated 8 months ago
- QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead☆32Jan 27, 2025Updated last year
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM☆12May 30, 2025Updated 9 months ago
- ☆19Oct 2, 2024Updated last year
- ☆18Aug 19, 2025Updated 6 months ago
- a single-header math library☆17Nov 7, 2025Updated 3 months ago
- 🤖 AI-powered CLI for file reorganization. Runs fully locally — no data leaves your machine.☆20Jul 2, 2025Updated 8 months ago
- Model Quantization Benchmark☆18Sep 30, 2025Updated 5 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆31May 1, 2025Updated 10 months ago
- ONNX Command-Line Toolbox☆35Oct 11, 2024Updated last year
- KV cache compression for high-throughput LLM inference☆154Feb 5, 2025Updated last year