mosecorg / mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
☆793Updated this week
Related projects ⓘ
Alternatives and complementary repositories for mosec
- ☆411Updated last year
- Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.☆553Updated this week
- Serving multiple LoRA finetuned LLM as one☆989Updated 6 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆1,908Updated this week
- RayLLM - LLMs on Ray☆1,236Updated 5 months ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,660Updated last month
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆744Updated this week
- A blazing fast inference solution for text embeddings models☆2,857Updated 2 weeks ago
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆1,482Updated this week
- 🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools☆2,579Updated this week
- A high-performance inference system for large language models, designed for production environments.☆394Updated this week
- The Triton TensorRT-LLM Backend☆710Updated this week
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,758Updated 10 months ago
- FlashInfer: Kernel Library for LLM Serving☆1,461Updated this week
- Common source, scripts and utilities for creating Triton backends.☆295Updated this week
- ☆193Updated this week
- Visualize hnsw, faiss and other anns index☆401Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆677Updated 7 months ago
- A throughput-oriented high-performance serving framework for LLMs☆640Updated 2 months ago
- A Survey of AI startups☆393Updated last year
- LLMPerf is a library for validating and benchmarking LLMs☆648Updated 3 months ago
- A tiny library for coding with large language models.☆1,215Updated 4 months ago
- LLM Inference benchmark☆350Updated 4 months ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,206Updated this week
- Efficient AI Inference & Serving☆458Updated 10 months ago
- One-click machine learning deployment (LLM, text-to-image and so on) at scale on any cluster (GCP, AWS, Lambda labs, your home lab, or ev…☆239Updated last year
- Fast Inference Solutions for BLOOM☆560Updated last month
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆26Updated last year
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆434Updated this week