A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
โ892Mar 1, 2026Updated this week
Alternatives and similar repositories for mosec
Users that are interested in mosec are comparing it to the libraries listed below
Sorting:
- ๐๏ธ Reproducible development environment for humans and agentsโ2,184Updated this week
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)โ281Nov 3, 2023Updated 2 years ago
- This repository contains statistics about the AI Infrastructure products.โ17Feb 27, 2025Updated last year
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.โ10,393Updated this week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.โ2,097Jun 30, 2025Updated 8 months ago
- An efficient binary serialization format for numerical data.โ17Nov 3, 2025Updated 4 months ago
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpaliโ2,688Feb 5, 2026Updated 3 weeks ago
- Large Language Model Text Generation Inferenceโ10,788Jan 8, 2026Updated last month
- SGLang is a high-performance serving framework for large language models and multimodal models.โ23,905Updated this week
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!โ8,472Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.โ7,645Updated this week
- Serving multiple LoRA finetuned LLM as oneโ1,145May 8, 2024Updated last year
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliโฆโ3,919Updated this week
- Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.โ2,158Feb 26, 2025Updated last year
- Efficient, scalable and enterprise-grade CPU/GPU inference server for ๐ค Hugging Face transformer models ๐โ1,687Oct 23, 2024Updated last year
- OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)โ276Oct 11, 2023Updated 2 years ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersโ1,899Jan 21, 2024Updated 2 years ago
- Large-scale model inference.โ627Sep 12, 2023Updated 2 years ago
- Fast, flexible LLM inferenceโ6,623Updated this week
- Turn PostgreSQL into your search engine in a Pythonic way.โ51Aug 29, 2025Updated 6 months ago
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, oโฆโ9,516Updated this week
- a fast cross platform AI inference engine ๐ค using Rust ๐ฆ and WebGPU ๐ฎโ463Jan 4, 2025Updated last year
- Training and serving large-scale neural networks with auto parallelization.โ3,183Dec 9, 2023Updated 2 years ago
- Transformer related optimization, including BERT, GPTโ6,398Mar 27, 2024Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nโฆโ4,706Jan 12, 2026Updated last month
- Docker for Your ML/DL Models Based on OCI Artifactsโ474Jan 26, 2024Updated 2 years ago
- Accessible large language models via k-bit quantization for PyTorch.โ7,997Updated this week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabโฆโ1,585Jan 28, 2026Updated last month
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Trainingโ1,863Updated this week
- An awesome & curated list of best LLMOps tools for developersโ5,645Feb 3, 2026Updated last month
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning modelsโ4,731Updated this week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.โ4,843Updated this week
- An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and moreโ875Updated this week
- RayLLM - LLMs on Ray (Archived). Read README for more info.โ1,267Mar 13, 2025Updated 11 months ago
- Running large language models on a single GPU for throughput-oriented scenarios.โ9,382Oct 28, 2024Updated last year
- Model Deployment at Scale on Kubernetes ๐ฆ๏ธโ836May 8, 2024Updated last year
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMsโ3,728May 21, 2025Updated 9 months ago
- Module, Model, and Tensor Serialization/Deserializationโ289Feb 6, 2026Updated 3 weeks ago
- Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetesโ5,135Updated this week