A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
โ900Jun 1, 2026Updated last week
Alternatives and similar repositories for mosec
Users that are interested in mosec are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ๐๏ธ Reproducible development environment for humans and agentsโ2,209May 21, 2026Updated 3 weeks ago
- This repository contains statistics about the AI Infrastructure products.โ17Feb 27, 2025Updated last year
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)โ283Nov 3, 2023Updated 2 years ago
- An efficient binary serialization format for numerical data.โ18Nov 3, 2025Updated 7 months ago
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.โ10,750Updated this week
- Managed hosting for WordPress and PHP on Cloudways โข AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpaliโ2,827Mar 24, 2026Updated 2 months ago
- Fast, flexible LLM inferenceโ7,282Updated this week
- Large Language Model Text Generation Inferenceโ10,859Mar 21, 2026Updated 2 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.โ2,107Jun 30, 2025Updated 11 months ago
- SGLang is a high-performance serving framework for large language models and multimodal models.โ28,886Jun 7, 2026Updated last week
- OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)โ277Oct 11, 2023Updated 2 years ago
- Vendor-agnostic orchestration for training, inference and agentic workloads across NVIDIA, AMD, TPU, and Tenstorrent on clouds, Kuberneteโฆโ2,155Updated this week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for ๐ค Hugging Face transformer models ๐โ1,687Oct 23, 2024Updated last year
- Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.โ2,172Feb 26, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI โข AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!โ8,670Jun 3, 2026Updated last week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.โ7,894Updated this week
- Serving multiple LoRA finetuned LLM as oneโ1,159May 8, 2024Updated 2 years ago
- Large-scale model inference.โ629Sep 12, 2023Updated 2 years ago
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliโฆโ4,086Jun 7, 2026Updated last week
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersโ1,912Jan 21, 2024Updated 2 years ago
- A conversational, AI device + software framework for companionship, entertainment, education, healthcare, IoT applications, and DIY robotโฆโ546Mar 27, 2026Updated 2 months ago
- Transformer related optimization, including BERT, GPTโ6,421Mar 27, 2024Updated 2 years ago
- a fast cross platform AI inference engine ๐ค using Rust ๐ฆ and WebGPU ๐ฎโ468Jan 4, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer โข AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)โ28Jun 28, 2023Updated 2 years ago
- Turn PostgreSQL into your search engine in a Pythonic way.โ52Aug 29, 2025Updated 9 months ago
- Training and serving large-scale neural networks with auto parallelization.โ3,184Dec 9, 2023Updated 2 years ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nโฆโ4,721Apr 9, 2026Updated 2 months ago
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabโฆโ1,586Jan 28, 2026Updated 4 months ago
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clโฆโ10,070Updated this week
- Docker for Your ML/DL Models Based on OCI Artifactsโ473Jan 26, 2024Updated 2 years ago
- Accessible large language models via k-bit quantization for PyTorch.โ8,263Updated this week
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters iโฆโ179Dec 16, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer โข AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Running large language models on a single GPU for throughput-oriented scenarios.โ9,365Oct 28, 2024Updated last year
- Train transformer language models with reinforcement learning.โ18,613Updated this week
- RayLLM - LLMs on Ray (Archived). Read README for more info.โ1,267Mar 13, 2025Updated last year
- An awesome & curated list of best LLMOps tools for developersโ5,829May 21, 2026Updated 3 weeks ago
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning modelsโ4,751Mar 23, 2026Updated 2 months ago
- All-in-one platform for search, recommendations, RAG, and analytics offered via APIโ2,674Jan 25, 2026Updated 4 months ago
- An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifacโฆโ1,372Updated this week