llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
☆94May 17, 2024Updated last year
Alternatives and similar repositories for llm-inference
Users that are interested in llm-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The framework of training large language models,support lora, full parameters fine tune etc, define yaml to start training/fine tune of y…☆31Sep 19, 2024Updated last year
- This repository provides installation scripts and configuration files for deploying the CSGHub instance, includes Helm charts and Docker…☆20Updated this week
- LLM scheduler user interface☆21May 17, 2024Updated last year
- ☆17Mar 24, 2023Updated 3 years ago
- ☆13Jan 7, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆20Sep 28, 2024Updated last year
- RayLLM - LLMs on Ray (Archived). Read README for more info.☆1,267Mar 13, 2025Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆250Mar 15, 2024Updated 2 years ago
- An agent that can run everywhere - even in your watch!☆30Apr 8, 2026Updated last week
- Azure Machine Learning - MLOps Python SDKv2☆10Jul 24, 2023Updated 2 years ago
- everything about llm based agent☆24Dec 19, 2025Updated 3 months ago
- Code for the paper "Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching" (COLING 2025)☆19Jan 3, 2026Updated 3 months ago
- Deduplication over dis-aggregated memory for Serverless Computing☆14Mar 21, 2022Updated 4 years ago
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Data mapping framework for rust stuff☆51Mar 25, 2026Updated 3 weeks ago
- ☆15Updated this week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆474May 30, 2025Updated 10 months ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆59Aug 21, 2024Updated last year
- ☆156Oct 9, 2024Updated last year
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- An adaption of Senders/Receivers for async networking and I/O☆19Apr 25, 2025Updated 11 months ago
- 从jieba分词到BERT-wwm,一步步带你进入中文NLP的世界☆15Sep 1, 2022Updated 3 years ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Sep 23, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Repo for sample files used in the developer hub for the Power Platform☆20Sep 3, 2024Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Framework☆51Updated this week
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- ☆34Updated this week
- Opinionated Langchain setup with Qdrant vector store and Kong gateway☆32Apr 7, 2023Updated 3 years ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆52Oct 20, 2023Updated 2 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆118Mar 13, 2024Updated 2 years ago
- A minimal toolkit for Context Engineering — Select, Compress, and Persist context with pure functions.☆44Jan 20, 2026Updated 2 months ago
- ☆10Sep 4, 2016Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A retargetable and extensible synthesis-based compiler for modern hardware architectures☆17Nov 20, 2025Updated 4 months ago
- 🤖 Kubernetes for AI Agents. Self-hosted, production-grade runtime for orchestrating LLM swarms and autonomous agents. TypeScript-native.☆35Updated this week
- Selection-based Question Answering☆14Feb 7, 2018Updated 8 years ago
- llms related stuff , including code, docs☆13Feb 25, 2025Updated last year
- LLM Inference with Deep Learning Accelerator.☆60Jan 23, 2025Updated last year
- SQLynx Pro: Desktop and Web SQL Tool. Both web and desktop access. Support popular SQL databases like mysql, mariadb, postgresql, sqlite …☆30May 11, 2025Updated 11 months ago
- ☆13Jun 3, 2019Updated 6 years ago