llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
☆93May 17, 2024Updated last year
Alternatives and similar repositories for llm-inference
Users that are interested in llm-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The framework of training large language models,support lora, full parameters fine tune etc, define yaml to start training/fine tune of y…☆31Sep 19, 2024Updated last year
- LLM scheduler user interface☆21May 17, 2024Updated last year
- ☆17Mar 24, 2023Updated 3 years ago
- An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API☆18Aug 21, 2025Updated 7 months ago
- 基于 MindSpore 框架 MS-Serving 服务适配的 Langchain-Chatchat(原Langchain-ChatGLM)☆19Mar 21, 2024Updated 2 years ago
- ☆20Sep 28, 2024Updated last year
- RayLLM - LLMs on Ray (Archived). Read README for more info.☆1,266Mar 13, 2025Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆250Mar 15, 2024Updated 2 years ago
- 一个移动终端的轻量级前端类库☆17May 24, 2013Updated 12 years ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- Large Language Model Onnx Inference Framework☆35Nov 25, 2025Updated 3 months ago
- Device plugins for Volcano, e.g. GPU☆134Mar 20, 2025Updated last year
- ☆11May 20, 2023Updated 2 years ago
- Code for the paper "Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching" (COLING 2025)☆19Jan 3, 2026Updated 2 months ago
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Framework☆49Jan 21, 2026Updated 2 months ago
- This repository contains the results and code for the MLPerf™ Inference v2.1 benchmark.☆18Jul 24, 2025Updated 8 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆466May 30, 2025Updated 9 months ago
- Secure and Scalable Federated Learning using Serverless Computing☆12Jan 31, 2024Updated 2 years ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆58Aug 21, 2024Updated last year
- ☆152Oct 9, 2024Updated last year
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 12 years ago
- An adaption of Senders/Receivers for async networking and I/O☆19Apr 25, 2025Updated 11 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Sep 23, 2025Updated 6 months ago
- ☆32Updated this week
- 🤖 Kubernetes for AI Agents. Self-hosted, production-grade runtime for orchestrating LLM swarms and autonomous agents. TypeScript-native.☆32Mar 16, 2026Updated last week
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆231Jun 16, 2025Updated 9 months ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆50Oct 20, 2023Updated 2 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆118Mar 13, 2024Updated 2 years ago
- Light local website for displaying performances from different chat models.☆86Nov 13, 2023Updated 2 years ago
- A minimal toolkit for Context Engineering — Select, Compress, and Persist context with pure functions.☆38Jan 20, 2026Updated 2 months ago
- ☆12Mar 31, 2021Updated 4 years ago
- ☆25Aug 27, 2021Updated 4 years ago
- ☆29May 13, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Oct 10, 2025Updated 5 months ago
- ☆23Jul 8, 2024Updated last year
- Selection-based Question Answering☆14Feb 7, 2018Updated 8 years ago
- Fork of NACA from Google Code☆13Feb 25, 2010Updated 16 years ago
- ☆12Jun 3, 2019Updated 6 years ago