llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
☆94May 17, 2024Updated last year
Alternatives and similar repositories for llm-inference
Users that are interested in llm-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The framework of training large language models,support lora, full parameters fine tune etc, define yaml to start training/fine tune of y…☆31Sep 19, 2024Updated last year
- LLM scheduler user interface☆21May 17, 2024Updated last year
- ☆17Mar 24, 2023Updated 3 years ago
- Plug in and Play Prompt Technique to Boost Model reasoning by 40%☆10May 30, 2023Updated 2 years ago
- An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API☆18Aug 21, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆13Jan 7, 2025Updated last year
- ☆20Sep 28, 2024Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆250Mar 15, 2024Updated 2 years ago
- ☆11Jan 8, 2025Updated last year
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- fast-embeddings-api☆16Nov 23, 2023Updated 2 years ago
- Open Source Text Embedding Models with OpenAI Compatible API☆168Jul 13, 2024Updated last year
- Azure Machine Learning - MLOps Python SDKv2☆10Jul 24, 2023Updated 2 years ago
- Code for the paper "Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching" (COLING 2025)☆19Jan 3, 2026Updated 4 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Deduplication over dis-aggregated memory for Serverless Computing☆14Mar 21, 2022Updated 4 years ago
- Data mapping framework for rust stuff☆51Mar 25, 2026Updated last month
- Dynamic Memory Management for Serving LLMs without PagedAttention☆482May 30, 2025Updated 11 months ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆59Aug 21, 2024Updated last year
- ☆158Oct 9, 2024Updated last year
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- Kubernetes device plugin for Biren GPU☆11Oct 17, 2024Updated last year
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Sep 23, 2025Updated 7 months ago
- ☆37Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Opinionated Langchain setup with Qdrant vector store and Kong gateway☆32Apr 7, 2023Updated 3 years ago
- Redis module that provides ratelimit☆25Apr 21, 2020Updated 6 years ago
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆232Jun 16, 2025Updated 10 months ago
- ☢️ TensorRT 2023复赛—— 基于TensorRT-LLM的Llama模型推断加速优化☆52Oct 20, 2023Updated 2 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Mar 13, 2024Updated 2 years ago
- Light local website for displaying performances from different chat models.☆86Nov 13, 2023Updated 2 years ago
- A minimal toolkit for Context Engineering — Select, Compress, and Persist context with pure functions.☆45Jan 20, 2026Updated 3 months ago
- GeekAI-Agent 是一款专注于 AI 对话的智能体平台,支持大模型 API 接入,支持 Coze ,Dify ,阿里百炼应用等智能体一键导入。自带后台管理和支付系统,支持用户积分充值,积分消费日志,支持用户管理,应用管理,系统设置等必备运营功能,让你的智能体能轻松变…☆20Dec 23, 2025Updated 4 months ago
- ☆25Aug 27, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Selection-based Question Answering☆14Feb 7, 2018Updated 8 years ago
- ☆23Jul 8, 2024Updated last year
- A Beginner's Guide to AI Engineering (SEC Financial News Generation)☆26Sep 1, 2024Updated last year
- Fork of NACA from Google Code☆13Feb 25, 2010Updated 16 years ago
- ☆12Jun 3, 2019Updated 6 years ago
- A throughput-oriented high-performance serving framework for LLMs☆956Mar 29, 2026Updated last month
- A third-party component library based on Gradio. Integrates Ant Design, Ant Design X, Monaco Editor and more advanced components to help…☆142Apr 22, 2026Updated last week