llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
☆94May 17, 2024Updated 2 years ago
Alternatives and similar repositories for llm-inference
Users that are interested in llm-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The framework of training large language models,support lora, full parameters fine tune etc, define yaml to start training/fine tune of y…☆31Sep 19, 2024Updated last year
- The CSGHub SDK is a powerful Python client specifically designed to interact seamlessly with the CSGHub server. This toolkit is engineere…☆23Jun 4, 2026Updated last week
- LLM scheduler user interface☆21May 17, 2024Updated 2 years ago
- AutoHub: A Personal Browser Automation Assistant☆25Jul 30, 2025Updated 10 months ago
- ☆17Mar 24, 2023Updated 3 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API☆18Aug 21, 2025Updated 9 months ago
- ☆34Jan 17, 2025Updated last year
- A structured learning repo for retrieval-augmented generation, from foundations to production patterns.☆26Apr 19, 2026Updated last month
- ☆13Jan 7, 2025Updated last year
- 基于 MindSpore 框架 MS-Serving 服务适配的 Langchain-Chatchat(原Langchain-ChatGLM)☆19Mar 21, 2024Updated 2 years ago
- A side project that follows all the acceleration tricks in tinyllama, with the minimal modification to the huggingface transformers code.☆13Sep 2, 2024Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆251Mar 15, 2024Updated 2 years ago
- 一个移动终端的轻量级前端类库☆17May 24, 2013Updated 13 years ago
- Elastic-Grok-Script-Plugin is a provider of Grok ElasticSearch plug-in☆12Dec 6, 2016Updated 9 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Large Language Model Onnx Inference Framework☆35Nov 25, 2025Updated 6 months ago
- Device plugins for Volcano, e.g. GPU☆137Mar 20, 2025Updated last year
- ModelVerse是一个功能强大的大语言模型(LLM)一体化推训平台,致力于为AI开发者和研究者提供完整的模型生命周期管理解决方案。从模型管理到推理部署,从训练微调到性能评估,ModelVerse将复杂的AI工作流程简化为直观易用的一体化平台。☆44Aug 14, 2025Updated 10 months ago
- Azure Machine Learning - MLOps Python SDKv2☆10Jul 24, 2023Updated 2 years ago
- Open Source Text Embedding Models with OpenAI Compatible API☆168Jul 13, 2024Updated last year
- 官方transformers源码解析。AI大模型时代,pytorch、transformer是新操作系统,其他都是运行在其上面的软件。☆16Sep 25, 2023Updated 2 years ago
- Code for the paper "Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching" (COLING 2025)☆19May 27, 2026Updated 2 weeks ago
- Deduplication over dis-aggregated memory for Serverless Computing☆14Mar 21, 2022Updated 4 years ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆491Updated this week
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Secure and Scalable Federated Learning using Serverless Computing☆12Jan 31, 2024Updated 2 years ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆58Aug 21, 2024Updated last year
- Detect-GPU is an http server that detect the host for NVIDIA GPU info.☆20Updated this week
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- Kubernetes device plugin for Biren GPU☆11Oct 17, 2024Updated last year
- Reddit clone built using Golang(Gin framework), GORM(ORM library) and React.js(Material UI)☆16Aug 11, 2022Updated 3 years ago
- Repo for sample files used in the developer hub for the Power Platform☆20Sep 3, 2024Updated last year
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- ☆13Apr 28, 2017Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆232Jun 16, 2025Updated 11 months ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆53Oct 20, 2023Updated 2 years ago
- Idiomatic Go bindings for Ghostscript Interpreter C API.☆31Aug 5, 2025Updated 10 months ago
- ☆42Jun 5, 2026Updated last week
- A Keras-based and TensorFlow-backend NLP Models Toolkit.☆12Jul 7, 2022Updated 3 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆122Mar 13, 2024Updated 2 years ago
- A minimal toolkit for Context Engineering — Select, Compress, and Persist context with pure functions.☆46Jan 20, 2026Updated 4 months ago