llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
☆94May 17, 2024Updated 2 years ago
Alternatives and similar repositories for llm-inference
Users that are interested in llm-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The framework of training large language models,support lora, full parameters fine tune etc, define yaml to start training/fine tune of y…☆31Sep 19, 2024Updated last year
- This repository provides installation scripts and configuration files for deploying the CSGHub instance, includes Helm charts and Docker…☆21Updated this week
- The CSGHub SDK is a powerful Python client specifically designed to interact seamlessly with the CSGHub server. This toolkit is engineere…☆23Apr 8, 2026Updated last month
- An open-source framework for building monolithic or distributed agentic systems, ranging from simple LLM calls to compositional workflows…☆29Jan 14, 2026Updated 4 months ago
- ☆17Mar 24, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API☆18Aug 21, 2025Updated 9 months ago
- A structured learning repo for retrieval-augmented generation, from foundations to production patterns.☆26Apr 19, 2026Updated last month
- ☆13Jan 7, 2025Updated last year
- 基于 MindSpore 框架 MS-Serving 服务适配的 Langchain-Chatchat(原Langchain-ChatGLM)☆19Mar 21, 2024Updated 2 years ago
- ☆20Sep 28, 2024Updated last year
- A side project that follows all the acceleration tricks in tinyllama, with the minimal modification to the huggingface transformers code.☆13Sep 2, 2024Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆252Mar 15, 2024Updated 2 years ago
- ☆11Jan 8, 2025Updated last year
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Large Language Model Onnx Inference Framework☆35Nov 25, 2025Updated 6 months ago
- Device plugins for Volcano, e.g. GPU☆137Mar 20, 2025Updated last year
- ☆75Mar 26, 2025Updated last year
- Open Source Text Embedding Models with OpenAI Compatible API☆168Jul 13, 2024Updated last year
- Azure Machine Learning - MLOps Python SDKv2☆10Jul 24, 2023Updated 2 years ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆43Oct 20, 2023Updated 2 years ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆17Aug 4, 2022Updated 3 years ago
- Code for the paper "Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching" (COLING 2025)☆19Jan 3, 2026Updated 4 months ago
- Deduplication over dis-aggregated memory for Serverless Computing☆14Mar 21, 2022Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆485May 30, 2025Updated 11 months ago
- Secure and Scalable Federated Learning using Serverless Computing☆12Jan 31, 2024Updated 2 years ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆59Aug 21, 2024Updated last year
- ☆160Oct 9, 2024Updated last year
- An adaption of Senders/Receivers for async networking and I/O☆20Apr 25, 2025Updated last year
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- Data mapping framework for rust stuff☆53Mar 25, 2026Updated 2 months ago
- Kubernetes device plugin for Biren GPU☆11Oct 17, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- AI 电商带货视频生成工具 | 上传商品图自动生成抖音/快手/小红书带货短视频 | Kling 3.0 / Veo 3 / Seedance 1.5 / FLUX☆74Mar 23, 2026Updated 2 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Sep 23, 2025Updated 8 months ago
- Vortex: A Flexible and Efficient Sparse Attention Framework☆53May 17, 2026Updated last week
- ☆13Apr 28, 2017Updated 9 years ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆52Oct 20, 2023Updated 2 years ago
- ☆39May 18, 2026Updated last week
- A Keras-based and TensorFlow-backend NLP Models Toolkit.☆12Jul 7, 2022Updated 3 years ago