OpenCSGs / llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
☆79Updated 9 months ago
Alternatives and similar repositories for llm-inference:
Users that are interested in llm-inference are comparing it to the libraries listed below
- The framework of training large language models,support lora, full parameters fine tune etc, define yaml to start training/fine tune of y…☆25Updated 5 months ago
- ☆107Updated 10 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆231Updated last week
- A Toolkit for Running On-device Large Language Models (LLMs) in APP☆60Updated 7 months ago
- bisheng-unstructured library☆41Updated 2 months ago
- Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆19Updated 4 months ago
- ✨🦋 illufly 是自我进化的 Agent 框架: 基于自我进化,快速创造价值☆52Updated this week
- The CSGHub SDK is a powerful Python client specifically designed to interact seamlessly with the CSGHub server. This toolkit is engineere…☆14Updated last month
- Easy, fast, and cheap pretrain,finetune, serving for everyone☆281Updated this week
- ☆31Updated 11 months ago
- Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024☆53Updated 3 months ago
- ☆152Updated this week
- GLM Series Edge Models☆130Updated this week
- run ChatGLM2-6B in BM1684X☆49Updated 11 months ago
- LLM scheduler user interface☆14Updated 9 months ago
- 顾名思义:手搓的RAG☆118Updated 11 months ago
- An easy-to-use framework for modular RAG☆319Updated this week
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆200Updated 3 months ago
- gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。☆152Updated last week
- SUS-Chat: Instruction tuning done right☆48Updated last year
- Byzer-retrieval is a distributed retrieval system which designed as a backend for LLM RAG (Retrieval Augmented Generation). The system su…☆45Updated last month
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆35Updated 3 months ago
- agentcraft 可以帮助您快速构建各类应用场景的ai agent应用☆54Updated this week
- ☆299Updated 8 months ago
- AGI模块库架构图☆75Updated last year
- Imitate OpenAI with Local Models☆86Updated 5 months ago
- 部署你自己的OpenAI api🤩, 基于flask, transformers (使用 Baichuan2-13B-Chat-4bits 模型, 可以运行在单张Tesla T4显卡) ,实现了OpenAI中Chat, Models和Completions接口,包含流式响…☆89Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆129Updated 2 months ago
- Mixture-of-Experts (MoE) Language Model☆184Updated 5 months ago