mosecorg / mosecLinks

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

☆882

Alternatives and similar repositories for mosec

Users that are interested in mosec are comparing it to the libraries listed below

Sorting:

ray-project / ray-llm
RayLLM - LLMs on Ray (Archived). Read README for more info.
☆1,263Updated 7 months ago
triton-inference-server / fastertransformer_backend
☆413Updated last year
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,070Updated 3 months ago
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆823Updated 2 months ago
punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆1,101Updated last year
hpcaitech / EnergonAI
Large-scale model inference.
☆631Updated 2 years ago
vectorch-ai / ScaleLLM
A high-performance inference system for large language models, designed for production environments.
☆479Updated 2 weeks ago
triton-inference-server / python_backend
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
☆648Updated last week
ELS-RD / kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…
☆1,586Updated last year
ray-project / llmperf
LLMPerf is a library for validating and benchmarking LLMs
☆1,032Updated 10 months ago
ELS-RD / transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
☆1,689Updated last year
huggingface / transformers-bloom-inference
Fast Inference Solutions for BLOOM
☆565Updated last year
BaguaSys / bagua
Bagua Speeds up PyTorch
☆883Updated last year
hpcaitech / SwiftInfer
Efficient AI Inference & Serving
☆478Updated last year
S-LoRA / S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,859Updated last year
WarrenWen666 / AI-Software-Startups
A Survey of AI startups
☆400Updated 2 years ago
tensorchord / openmodelz
Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)
☆275Updated last year
tensorchord / envd
🏕️ Reproducible development environment
☆2,151Updated 3 weeks ago
triton-inference-server / model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…
☆494Updated last week
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆901Updated last week
triton-inference-server / vllm_backend
☆302Updated this week
mlc-ai / xgrammar
Fast, Flexible and Portable Structured Generation
☆1,309Updated last week
triton-inference-server / backend
Common source, scripts and utilities for creating Triton backends.
☆352Updated last week
crabml / crabml
a fast cross platform AI inference engine 🤖 using Rust 🦀 and WebGPU 🎮
☆462Updated 9 months ago
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆270Updated 2 months ago
jina-ai / vectordb
A Python vector database you just need - no more, no less.
☆633Updated last year
triton-inference-server / client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
☆652Updated last week
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,163Updated last year
ninehills / llm-inference-benchmark
LLM Inference benchmark
☆428Updated last year
meta-pytorch / torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…
☆395Updated last week