gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。
☆244Feb 6, 2026Updated 3 weeks ago
Alternatives and similar repositories for gpt_server
Users that are interested in gpt_server are comparing it to the libraries listed below
Sorting:
- Evaluation for AI apps and agent☆44Jan 18, 2024Updated 2 years ago
- code for piccolo embedding model from SenseTime☆144May 21, 2024Updated last year
- 通义千问VLLM推理部署DEMO☆640Mar 28, 2024Updated last year
- MFIN7036 NLP Course Project☆10Jul 25, 2024Updated last year
- Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, X…☆2,467Sep 26, 2024Updated last year
- accelerate generating vector by using onnx model☆18Jan 23, 2024Updated 2 years ago
- Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.☆1,094Jul 5, 2025Updated 7 months ago
- LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA☆521Dec 31, 2024Updated last year
- unified embedding model☆876Sep 1, 2023Updated 2 years ago
- 基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。☆588May 18, 2025Updated 9 months ago
- share data, prompt data , pretraining data☆36Nov 30, 2023Updated 2 years ago
- TextEmbed is a REST API crafted for high-throughput and low-latency embedding inference. It accommodates a wide variety of embedding mode…☆28Sep 5, 2024Updated last year
- ☆12Jun 28, 2024Updated last year
- Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-p…☆9,072Updated this week
- Netease Youdao's open-source embedding and reranker models for RAG products.☆1,860Sep 9, 2025Updated 5 months ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆306Sep 10, 2024Updated last year
- 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.☆256Aug 1, 2023Updated 2 years ago
- an intelligent question answering system (智能对话系统)☆24Jan 5, 2021Updated 5 years ago
- 使用FastAPI+vLLM部署Qwen2.5☆26Sep 29, 2024Updated last year
- Accelerating GOT-OCRv2 with VLLM☆11Nov 15, 2024Updated last year
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- MyScale Vector Database Benchmark☆16Aug 20, 2024Updated last year
- ☆17Jun 14, 2025Updated 8 months ago
- Useful resources for creating apps and working with flow.☆11Oct 28, 2024Updated last year
- A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处…☆283Updated this week
- 🚀 LLM inference optimization simulator, modeling compute-bound prefill and memory-bound decode phases.☆13Jul 12, 2025Updated 7 months ago
- ☆10Apr 30, 2025Updated 10 months ago
- 若依前后端分离框架的Python实现。☆15Jun 6, 2022Updated 3 years ago
- ☆14Jul 1, 2025Updated 8 months ago
- aigc_serving lightweight and efficient Language service model reasoning☆24Jun 12, 2024Updated last year
- OpenAI compatible API for open source LLMs☆16Oct 30, 2023Updated 2 years ago
- ☆55Jan 3, 2025Updated last year
- A local search system implementation using Elasticsearch for Wikipedia data indexing and retrieval.☆12May 17, 2025Updated 9 months ago
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- qwen2 and llama3 cpp implementation☆49Jun 7, 2024Updated last year
- human in the loop in dify workflow by plugin☆14Jan 7, 2025Updated last year
- Like cURL but for MCP☆20Jul 19, 2025Updated 7 months ago
- [EMNLP 2024] LongRAG: A Dual-perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering☆120Jan 29, 2025Updated last year
- 百度UIE抽取模型torch版训练预测框架☆12Nov 20, 2024Updated last year