A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
โ902Jun 25, 2026Updated last week
Alternatives and similar repositories for mosec
Users that are interested in mosec are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ๐๏ธ Reproducible development environment for humans and agentsโ2,211May 21, 2026Updated last month
- This repository contains statistics about the AI Infrastructure products.โ17Feb 27, 2025Updated last year
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)โ283Nov 3, 2023Updated 2 years ago
- An efficient binary serialization format for numerical data.โ18Nov 3, 2025Updated 8 months ago
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.โ10,796Updated this week
- Managed hosting for WordPress and PHP on Cloudways โข AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpaliโ2,857Mar 24, 2026Updated 3 months ago
- Fast, flexible LLM inferenceโ7,410Updated this week
- Large Language Model Text Generation Inferenceโ10,862Mar 21, 2026Updated 3 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.โ2,107Jun 30, 2025Updated last year
- SGLang is a high-performance serving framework for large language models and multimodal models.โ29,694Jun 27, 2026Updated last week
- OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)โ277Oct 11, 2023Updated 2 years ago
- Vendor-agnostic orchestration for training, inference and agentic workloads across NVIDIA, AMD, TPU, and Tenstorrent on clouds, Kuberneteโฆโ2,168Updated this week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for ๐ค Hugging Face transformer models ๐โ1,689Oct 23, 2024Updated last year
- Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.โ2,175Feb 26, 2025Updated last year
- Virtual machines for every use case on DigitalOcean โข AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!โ8,697Jun 22, 2026Updated last week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.โ7,928Updated this week
- Serving multiple LoRA finetuned LLM as oneโ1,163May 8, 2024Updated 2 years ago
- Large-scale model inference.โ629Sep 12, 2023Updated 2 years ago
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliโฆโ4,141Jun 26, 2026Updated last week
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersโ1,914Jan 21, 2024Updated 2 years ago
- A conversational, AI device + software framework for companionship, entertainment, education, healthcare, IoT applications, and DIY robotโฆโ548Mar 27, 2026Updated 3 months ago
- Transformer related optimization, including BERT, GPTโ6,433Mar 27, 2024Updated 2 years ago
- a fast cross platform AI inference engine ๐ค using Rust ๐ฆ and WebGPU ๐ฎโ469Jan 4, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient โข AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)โ28Jun 28, 2023Updated 3 years ago
- Turn PostgreSQL into your search engine in a Pythonic way.โ52Aug 29, 2025Updated 10 months ago
- Training and serving large-scale neural networks with auto parallelization.โ3,179Dec 9, 2023Updated 2 years ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nโฆโ4,723Apr 9, 2026Updated 2 months ago
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabโฆโ1,585Jan 28, 2026Updated 5 months ago
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clโฆโ10,224Jun 27, 2026Updated last week
- Docker for Your ML/DL Models Based on OCI Artifactsโ473Jan 26, 2024Updated 2 years ago
- Accessible large language models via k-bit quantization for PyTorch.โ8,305Updated this week
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters iโฆโ179Dec 16, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer โข AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Running large language models on a single GPU for throughput-oriented scenarios.โ9,362Oct 28, 2024Updated last year
- Train transformer language models with reinforcement learning.โ18,735Updated this week
- RayLLM - LLMs on Ray (Archived). Read README for more info.โ1,263Mar 13, 2025Updated last year
- An awesome & curated list of best LLMOps tools for developersโ5,866May 21, 2026Updated last month
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning modelsโ4,755Mar 23, 2026Updated 3 months ago
- All-in-one platform for search, recommendations, RAG, and analytics offered via APIโ2,684Jan 25, 2026Updated 5 months ago
- An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifacโฆโ1,380Jun 25, 2026Updated last week