llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
☆92May 17, 2024Updated last year
Alternatives and similar repositories for llm-inference
Users that are interested in llm-inference are comparing it to the libraries listed below
Sorting:
- The framework of training large language models,support lora, full parameters fine tune etc, define yaml to start training/fine tune of y…☆31Sep 19, 2024Updated last year
- This repository provides installation scripts and configuration files for deploying the CSGHub instance, includes Helm charts and Docker…☆18Updated this week
- LLM scheduler user interface☆21May 17, 2024Updated last year
- Azure Machine Learning - MLOps Python SDKv2☆10Jul 24, 2023Updated 2 years ago
- An open-source framework for building monolithic or distributed agentic systems, ranging from simple LLM calls to compositional workflows…☆25Jan 14, 2026Updated last month
- ☆13Jan 7, 2025Updated last year
- ☆11Jan 8, 2025Updated last year
- Repo for sample files used in the developer hub for the Power Platform☆19Sep 3, 2024Updated last year
- Keya Medical COVID-19 detection service☆15Apr 3, 2020Updated 5 years ago
- GeekAI-Agent 是一款专注于 AI 对话的智能体平台,支持大模型 API 接入,支持 Coze ,Dify ,阿里百炼应用等智能体一键导入。自带后台管理和支付系统,支持用户积分充值,积分消费日志,支持用户管理,应用管理,系统设置等必备运营功能,让你的智能体能轻松变…☆17Dec 23, 2025Updated 2 months ago
- A side project that follows all the acceleration tricks in tinyllama, with the minimal modification to the huggingface transformers code.☆13Sep 2, 2024Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Framework☆48Jan 21, 2026Updated last month
- 基于one-api和new-api,增加luma、runway、kling等模型支持,增加微信支付。☆32Jan 27, 2026Updated last month
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Dec 9, 2022Updated 3 years ago
- RayLLM - LLMs on Ray (Archived). Read README for more info.☆1,267Mar 13, 2025Updated 11 months ago
- ☆20Sep 28, 2024Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆251Mar 15, 2024Updated last year
- AI Multi-agent system for real-time, adaptive supply chain coordination and optimization leveraging responsive AI clusters.☆36Mar 28, 2024Updated last year
- ☆150Oct 9, 2024Updated last year
- DeepParseX 是一个强大的多模态文档 解析与知识管理平台,支持 PDF、Word、Excel、PPT、图片、视频、音频 等多种文件格式的智能解析,自动提取关键信息,并构建 检索增强生成(RAG) 和 知识图谱(Knowledge Graph) 系统,实现结构化数据的智…☆56Feb 21, 2026Updated last week
- NextJS,React,Lama,VAPI,Twillio,Retell,Telegram,Sales☆27Nov 4, 2024Updated last year
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.☆24Nov 6, 2024Updated last year
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆51Oct 20, 2023Updated 2 years ago
- ☆29May 13, 2024Updated last year
- 2025.01:从零到一实现了一个多模态大模型,并 命名为Reyes(睿视),R:睿,eyes:眼。Reyes的参数量为8B,视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct,Reyes也通过一个两…☆31Feb 10, 2026Updated 2 weeks ago
- Kaggle AIMO2 solution with token-efficient reasoning LLM recipes☆42Aug 7, 2025Updated 6 months ago
- CozeX(扣子X)是利用Coze扣子的API封装和火山引擎方舟大模型平台SDK-API封装的一套可以快速使用企业部署使用火山引擎旗下大模型AI产品开源的SaaS软件☆41Sep 22, 2025Updated 5 months ago
- 🗲 A high-performance on-disk dictionary.☆29Dec 4, 2025Updated 2 months ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆58Aug 21, 2024Updated last year
- Data mapping framework for rust stuff☆46Updated this week
- A third-party component library based on Gradio. Integrates Ant Design, Ant Design X, Monaco Editor and more advanced components to help…☆137Nov 20, 2025Updated 3 months ago
- ☆39Jan 20, 2025Updated last year
- ☆25Aug 27, 2021Updated 4 years ago
- An open-source session replay tool for single-page applications that uses AI analysis, aggregated trends, and a RAG chatbot to help devel…☆11Jan 23, 2026Updated last month
- Opinionated Langchain setup with Qdrant vector store and Kong gateway☆32Apr 7, 2023Updated 2 years ago
- GPT-VIS-API 是一个轻量级图表生成服务,旨在解决 [antv/mcp-server-chart](https://github.com/antvis/mcp-server-chart) 在私有化部署方面的局限性。该服务接收数据请求,生成图表图像,上传到 MinI…☆57Jan 27, 2026Updated last month
- ☆71Mar 26, 2025Updated 11 months ago
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆228Jun 16, 2025Updated 8 months ago