Cost-efficient and pluggable Infrastructure components for GenAI inference
☆4,650Feb 27, 2026Updated this week
Alternatives and similar repositories for aibrix
Users that are interested in aibrix are comparing it to the libraries listed below
Sorting:
- A Datacenter Scale Distributed Inference Serving Framework☆6,154Updated this week
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆2,187Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.☆23,905Updated this week
- Supercharge Your LLM with the Fastest KV Cache Layer☆7,272Updated this week
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆673Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆71,234Updated this week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,843Updated this week
- Gateway API Inference Extension☆597Updated this week
- A toolkit to run Ray applications on Kubernetes☆2,355Updated this week
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, o…☆9,516Updated this week
- Composable building blocks to build LLM Apps☆8,278Updated this week
- FlashInfer: Kernel Library for LLM Serving☆5,057Updated this week
- A high-performance distributed file system designed to address the challenges of AI training and inference workloads.☆9,730Feb 25, 2026Updated last week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…☆12,938Feb 25, 2026Updated last week
- Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes☆5,135Updated this week
- DeepEP: an efficient expert-parallel communication library☆9,005Feb 9, 2026Updated 3 weeks ago
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.☆53,029Updated this week
- Achieve state of the art inference performance with modern accelerators on Kubernetes☆2,543Updated this week
- NVIDIA Inference Xfer Library (NIXL)☆898Updated this week
- Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!☆8,901Updated this week
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,970May 15, 2025Updated 9 months ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆19,519Updated this week
- KV cache store for distributed LLM inference☆392Nov 13, 2025Updated 3 months ago
- Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)☆3,047Updated this week
- FlashMLA: Efficient Multi-head Latent Attention Kernels☆12,505Feb 6, 2026Updated 3 weeks ago
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆37,083Updated this week
- Large Language Model Text Generation Inference☆10,788Jan 8, 2026Updated last month
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.☆3,803Feb 25, 2026Updated last week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆2,787Updated this week
- AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-te…☆1,158Feb 23, 2026Updated last week
- A Cloud Native Batch System (Project under CNCF)☆5,352Updated this week
- Build, run, manage agentic software at scale.☆38,276Updated this week
- Fast, flexible LLM inference☆6,623Updated this week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆1,144Updated this week
- Universal memory layer for AI Agents☆47,994Feb 23, 2026Updated last week
- DSPy: The framework for programming—not prompting—language models☆32,381Feb 24, 2026Updated last week
- A lightweight data processing framework built on DuckDB and 3FS.☆4,931Mar 5, 2025Updated 11 months ago
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,645Updated this week
- Official inference framework for 1-bit LLMs☆28,640Feb 3, 2026Updated last month