A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
โ899May 1, 2026Updated this week
Alternatives and similar repositories for mosec
Users that are interested in mosec are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ๐๏ธ Reproducible development environment for humans and agentsโ2,200Updated this week
- This repository contains statistics about the AI Infrastructure products.โ17Feb 27, 2025Updated last year
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)โ282Nov 3, 2023Updated 2 years ago
- An efficient binary serialization format for numerical data.โ18Nov 3, 2025Updated 6 months ago
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.โ10,625Updated this week
- End-to-end encrypted cloud storage - Proton Drive โข AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpaliโ2,782Mar 24, 2026Updated last month
- Fast, flexible LLM inferenceโ7,074Apr 15, 2026Updated 2 weeks ago
- Large Language Model Text Generation Inferenceโ10,848Mar 21, 2026Updated last month
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.โ2,110Jun 30, 2025Updated 10 months ago
- SGLang is a high-performance serving framework for large language models and multimodal models.โ26,832Updated this week
- OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)โ277Oct 11, 2023Updated 2 years ago
- Vendor-agnostic orchestration for training, inference and agentic workloads across NVIDIA, AMD, TPU, and Tenstorrent on clouds, Kuberneteโฆโ2,128Updated this week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for ๐ค Hugging Face transformer models ๐โ1,687Oct 23, 2024Updated last year
- Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.โ2,172Feb 26, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer โข AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!โ8,606Apr 16, 2026Updated 2 weeks ago
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.โ7,836Updated this week
- Serving multiple LoRA finetuned LLM as oneโ1,156May 8, 2024Updated last year
- Large-scale model inference.โ628Sep 12, 2023Updated 2 years ago
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliโฆโ4,036Updated this week
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersโ1,909Jan 21, 2024Updated 2 years ago
- A conversational, AI device + software framework for companionship, entertainment, education, healthcare, IoT applications, and DIY robotโฆโ546Mar 27, 2026Updated last month
- Transformer related optimization, including BERT, GPTโ6,415Mar 27, 2024Updated 2 years ago
- a fast cross platform AI inference engine ๐ค using Rust ๐ฆ and WebGPU ๐ฎโ467Jan 4, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient โข AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)โ28Jun 28, 2023Updated 2 years ago
- Turn PostgreSQL into your search engine in a Pythonic way.โ52Aug 29, 2025Updated 8 months ago
- Training and serving large-scale neural networks with auto parallelization.โ3,187Dec 9, 2023Updated 2 years ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nโฆโ4,718Apr 9, 2026Updated 3 weeks ago
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabโฆโ1,585Jan 28, 2026Updated 3 months ago
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clโฆโ9,923Updated this week
- Docker for Your ML/DL Models Based on OCI Artifactsโ474Jan 26, 2024Updated 2 years ago
- Accessible large language models via k-bit quantization for PyTorch.โ8,168Apr 20, 2026Updated 2 weeks ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters iโฆโ179Dec 16, 2025Updated 4 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits โข AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Running large language models on a single GPU for throughput-oriented scenarios.โ9,366Oct 28, 2024Updated last year
- Train transformer language models with reinforcement learning.โ18,193Updated this week
- RayLLM - LLMs on Ray (Archived). Read README for more info.โ1,267Mar 13, 2025Updated last year
- An awesome & curated list of best LLMOps tools for developersโ5,764Apr 6, 2026Updated 3 weeks ago
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning modelsโ4,746Mar 23, 2026Updated last month
- All-in-one platform for search, recommendations, RAG, and analytics offered via APIโ2,642Jan 25, 2026Updated 3 months ago
- An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifacโฆโ1,339Apr 27, 2026Updated last week