A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
☆893Mar 1, 2026Updated 3 weeks ago
Alternatives and similar repositories for mosec
Users that are interested in mosec are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 🏕️ Reproducible development environment for humans and agents☆2,187Mar 5, 2026Updated 2 weeks ago
- This repository contains statistics about the AI Infrastructure products.☆16Feb 27, 2025Updated last year
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆281Nov 3, 2023Updated 2 years ago
- An efficient binary serialization format for numerical data.☆18Nov 3, 2025Updated 4 months ago
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.☆10,446Updated this week
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,724Feb 5, 2026Updated last month
- Fast, flexible LLM inference☆6,713Mar 15, 2026Updated last week
- Large Language Model Text Generation Inference☆10,812Jan 8, 2026Updated 2 months ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆24,829Updated this week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,101Jun 30, 2025Updated 8 months ago
- OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)☆277Oct 11, 2023Updated 2 years ago
- dstack is an open-source control plane for running development, training, and inference jobs on GPUs—across hyperscalers, neoclouds, or o…☆2,069Updated this week
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,687Oct 23, 2024Updated last year
- Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.☆2,162Feb 26, 2025Updated last year
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!☆8,520Mar 16, 2026Updated last week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,711Updated this week
- Serving multiple LoRA finetuned LLM as one☆1,148May 8, 2024Updated last year
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,958Updated this week
- Large-scale model inference.☆627Sep 12, 2023Updated 2 years ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,903Jan 21, 2024Updated 2 years ago
- A conversational, AI device + software framework for companionship, entertainment, education, healthcare, IoT applications, and DIY robot…☆543Feb 25, 2025Updated last year
- Transformer related optimization, including BERT, GPT☆6,400Mar 27, 2024Updated last year
- a fast cross platform AI inference engine 🤖 using Rust 🦀 and WebGPU 🎮☆464Jan 4, 2025Updated last year
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Jun 28, 2023Updated 2 years ago
- Turn PostgreSQL into your search engine in a Pythonic way.☆51Aug 29, 2025Updated 6 months ago
- Training and serving large-scale neural networks with auto parallelization.☆3,187Dec 9, 2023Updated 2 years ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,709Mar 16, 2026Updated last week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,585Jan 28, 2026Updated last month
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ cl…☆9,664Updated this week
- Docker for Your ML/DL Models Based on OCI Artifacts☆472Jan 26, 2024Updated 2 years ago
- Accessible large language models via k-bit quantization for PyTorch.☆8,052Updated this week
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆179Dec 16, 2025Updated 3 months ago
- Running large language models on a single GPU for throughput-oriented scenarios.☆9,379Oct 28, 2024Updated last year
- Train transformer language models with reinforcement learning.☆17,697Updated this week
- RayLLM - LLMs on Ray (Archived). Read README for more info.☆1,266Mar 13, 2025Updated last year
- An awesome & curated list of best LLMOps tools for developers☆5,668Feb 3, 2026Updated last month
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models☆4,735Updated this week
- All-in-one platform for search, recommendations, RAG, and analytics offered via API☆2,612Jan 25, 2026Updated last month
- An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifac…☆1,315Updated this week