Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.
☆4,573Mar 4, 2026Updated this week
Alternatives and similar repositories for gpustack
Users that are interested in gpustack are comparing it to the libraries listed below
Sorting:
- LM inference server implementation based on *.cpp.☆296Nov 24, 2025Updated 3 months ago
- Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-p…☆9,089Updated this week
- A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations☆16,716Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.☆24,216Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆71,883Updated this week
- RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…☆74,309Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,645Updated this week
- Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)☆3,060Updated this week
- Production-ready platform for agentic workflow development.☆131,572Updated this week
- FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data process…☆27,256Updated this week
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)☆67,966Updated this week
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.☆53,029Updated this week
- 🔥 MaxKB is an open-source platform for building enterprise-grade agents. 强大易用的开源企业级智能体平台。☆20,227Updated this week
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆31,296Updated this week
- Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.☆55,756Updated this week
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆197Dec 23, 2025Updated 2 months ago
- AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents☆18,199Updated this week
- Run frontier AI locally.☆42,347Updated this week
- LLM API 管理 & 分发系统,支持 OpenAI、Azure、Anthropic Claude、Google Gemini、DeepSeek、字节豆包、ChatGLM、文心一言、讯飞星火、通义千问、360 智脑、腾讯混元等主流模型,统一 API 适配,可用于 key …☆30,097Jan 9, 2026Updated 2 months ago
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆37,994Updated this week
- The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement, running on consumer-g…☆43,229Updated this week
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆15,036Feb 28, 2026Updated last week
- User-friendly AI Interface (Supports Ollama, OpenAI API, ...)☆126,337Updated this week
- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.☆7,665Nov 19, 2025Updated 3 months ago
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆15,126Updated this week
- Universal memory layer for AI Agents☆48,604Updated this week
- Question and Answer based on Anything.☆13,871Mar 24, 2025Updated 11 months ago
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,787Jul 4, 2025Updated 8 months ago
- The all-in-one AI productivity accelerator. On device and privacy first with no annoying setup or configration.☆55,868Updated this week
- 🤖 AI Gateway | AI Native API Gateway☆7,666Updated this week
- No fortress, purely open ground. OpenManus is Coming.☆55,070Feb 11, 2026Updated 3 weeks ago
- 🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.☆22,891Feb 2, 2026Updated last month
- Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain…☆37,430Nov 10, 2025Updated 3 months ago
- An open-source RAG-based tool for chatting with your documents.☆25,193Updated this week
- cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,mlops算法链路全流程,算力租赁平台,notebook在线开发,拖拉拽任务流pipeline编排, 多机多卡分布式训练,超参搜索,推理服务VGPU虚拟化,边缘计算,标注平台自动化标注,deepseek…☆4,873Feb 6, 2026Updated last month
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆24,027Feb 23, 2026Updated 2 weeks ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆247Feb 11, 2026Updated 3 weeks ago
- A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.☆2,463Updated this week
- Build, run, manage agentic software at scale.☆38,516Updated this week