A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
โ897Apr 1, 2026Updated last week
Alternatives and similar repositories for mosec
Users that are interested in mosec are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ๐๏ธ Reproducible development environment for humans and agentsโ2,192Apr 6, 2026Updated last week
- This repository contains statistics about the AI Infrastructure products.โ17Feb 27, 2025Updated last year
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)โ282Nov 3, 2023Updated 2 years ago
- An efficient binary serialization format for numerical data.โ18Nov 3, 2025Updated 5 months ago
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.โ10,533Updated this week
- Wordpress hosting with auto-scaling - Free Trial โข AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpaliโ2,752Mar 24, 2026Updated 3 weeks ago
- Fast, flexible LLM inferenceโ6,928Updated this week
- Large Language Model Text Generation Inferenceโ10,830Mar 21, 2026Updated 3 weeks ago
- SGLang is a high-performance serving framework for large language models and multimodal models.โ25,643Updated this week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.โ2,107Jun 30, 2025Updated 9 months ago
- OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)โ277Oct 11, 2023Updated 2 years ago
- Control plane for agents and engineers to provision compute and run training and inference across NVIDIA, AMD, TPU, and Tenstorrent GPUsโโฆโ2,090Updated this week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for ๐ค Hugging Face transformer models ๐โ1,688Oct 23, 2024Updated last year
- Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.โ2,166Feb 26, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform โข AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!โ8,563Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.โ7,775Updated this week
- Serving multiple LoRA finetuned LLM as oneโ1,152May 8, 2024Updated last year
- Large-scale model inference.โ628Sep 12, 2023Updated 2 years ago
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliโฆโ3,997Updated this week
- S-LoRA: Serving Thousands of Concurrent LoRA Adaptersโ1,907Jan 21, 2024Updated 2 years ago
- A conversational, AI device + software framework for companionship, entertainment, education, healthcare, IoT applications, and DIY robotโฆโ543Mar 27, 2026Updated 2 weeks ago
- Transformer related optimization, including BERT, GPTโ6,412Mar 27, 2024Updated 2 years ago
- a fast cross platform AI inference engine ๐ค using Rust ๐ฆ and WebGPU ๐ฎโ465Jan 4, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI โข AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)โ28Jun 28, 2023Updated 2 years ago
- Turn PostgreSQL into your search engine in a Pythonic way.โ52Aug 29, 2025Updated 7 months ago
- Training and serving large-scale neural networks with auto parallelization.โ3,187Dec 9, 2023Updated 2 years ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (Nโฆโ4,716Apr 7, 2026Updated last week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabโฆโ1,585Jan 28, 2026Updated 2 months ago
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clโฆโ9,822Updated this week
- Docker for Your ML/DL Models Based on OCI Artifactsโ474Jan 26, 2024Updated 2 years ago
- Accessible large language models via k-bit quantization for PyTorch.โ8,107Updated this week
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters iโฆโ179Dec 16, 2025Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial โข AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Running large language models on a single GPU for throughput-oriented scenarios.โ9,375Oct 28, 2024Updated last year
- Train transformer language models with reinforcement learning.โ17,967Apr 7, 2026Updated last week
- RayLLM - LLMs on Ray (Archived). Read README for more info.โ1,267Mar 13, 2025Updated last year
- An awesome & curated list of best LLMOps tools for developersโ5,711Apr 6, 2026Updated last week
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning modelsโ4,740Mar 23, 2026Updated 3 weeks ago
- An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifacโฆโ1,331Updated this week
- All-in-one platform for search, recommendations, RAG, and analytics offered via APIโ2,640Jan 25, 2026Updated 2 months ago