OpenCSGs / llm-inferenceLinks

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

☆86

Alternatives and similar repositories for llm-inference

Users that are interested in llm-inference are comparing it to the libraries listed below

Sorting:

OpenCSGs / llm-finetune
The framework of training large language models，support lora, full parameters fine tune etc, define yaml to start training/fine tune of y…
☆30Updated last year
01-ai / Descartes
☆112Updated last year
dataelement / bisheng-unstructured
bisheng-unstructured library
☆55Updated 5 months ago
HFAiLab / hai-platform-studio
配合 HAI Platform 使用的集成化用户界面
☆53Updated 2 years ago
OpenCSGs / llm-scheduler-ui
LLM scheduler user interface
☆18Updated last year
allwefantasy / byzer-llm
Easy, fast, and cheap pretrain,finetune, serving for everyone
☆315Updated 3 months ago
xverse-ai / XVERSE-MoE-A4.2B
XVERSE-MoE-A4.2B: A multilingual large language model developed by XVERSE Technology Inc.
☆39Updated last year
zilliztech / akcio
Akcio is a demonstration project for Retrieval Augmented Generation (RAG). It leverages the power of LLM to generate responses and uses v…
☆258Updated last year
zai-org / GLM-Edge
GLM Series Edge Models
☆149Updated 4 months ago
tpoisonooo / ROGRAG
[ACL2025 demo track] ROGRAG: A Robustly Optimized GraphRAG Framework
☆175Updated 3 weeks ago
xorbitsai / xllamacpp
xllamacpp - a Python wrapper of llama.cpp
☆60Updated last week
allwefantasy / BYZER-RETRIEVAL
Byzer-retrieval is a distributed retrieval system which designed as a backend for LLM RAG (Retrieval Augmented Generation). The system su…
☆49Updated 7 months ago
NVIDIA / workbench-llamafactory
This is an NVIDIA AI Workbench example project that demonstrates an end-to-end model development workflow using Llamafactory.
☆67Updated last year
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆248Updated last year
shootime2021 / APUS-xDAN-4.0-moe
Its an open source LLM based on MOE Structure.
☆58Updated last year
dataelement / bisheng-rt
bisheng model services backend
☆30Updated last year
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated last month
allwefantasy / byzer-agent
☆32Updated last year
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆266Updated 2 months ago
codefuse-ai / FasterTransformer4CodeFuse
High-performance LLM inference based on our optimized version of FastTransfomer
☆123Updated last year
thunlp / Delta-CoMe
Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024
☆57Updated 11 months ago
alibaba / app-controller
App-Controller: Allow users to manipulate your App with natural language
☆131Updated 11 months ago
sophgo / ChatGLM3-TPU
run chatglm3-6b in BM1684X
☆40Updated last year
hyperai / vllm-cn
vLLM Documentation in Chinese Simplified / vLLM 中文文档
☆114Updated last week
OpenBMB / MobileCPM
A Toolkit for Running On-device Large Language Models (LLMs) in APP
☆78Updated last year
OpenCSGs / csghub-charts
This repository provides installation scripts and configuration files for deploying the CSGHub instance, includes Helm charts and Docker…
☆16Updated this week
SomeoneKong / llm_long_context_bench202405
☆29Updated last year
pandada8 / llm-inference-benchmark
LLM 推理服务性能测试
☆43Updated last year
infinigence / InfiniWebSearch
A demo built on Megrez-3B-Instruct, integrating a web search tool to enhance the model's question-and-answer capabilities.
☆39Updated 10 months ago
IEIT-Yuan / Yuan2.0-M32
Mixture-of-Experts (MoE) Language Model
☆189Updated last year