vllm-project / aibrix
Cost-efficient and pluggable Infrastructure components for GenAI inference
☆3,484Updated this week
Alternatives and similar repositories for aibrix:
Users that are interested in aibrix are comparing it to the libraries listed below
- A Datacenter Scale Distributed Inference Serving Framework☆3,849Updated this week
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆1,105Updated this week
- A lightweight data processing framework built on DuckDB and 3FS.☆4,580Updated last month
- Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!☆3,763Updated this week
- verl: Volcano Engine Reinforcement Learning for LLMs☆7,134Updated this week
- Sky-T1: Train your own O1 preview model within $450☆3,220Updated this week
- Redis for LLMs☆834Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,848Updated 8 months ago
- A high-performance distributed file system designed to address the challenges of AI training and inference workloads.☆8,768Updated this week
- Agent Framework / shim to use Pydantic with LLMs☆8,893Updated this week
- AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-te…☆910Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,962Updated this week
- An open source deep research clone. AI Agent that reasons large amounts of web data extracted with Firecrawl☆5,435Updated 2 months ago
- FlashInfer: Kernel Library for LLM Serving☆2,731Updated this week
- Build effective agents using Model Context Protocol and simple workflow patterns☆4,029Updated this week
- Composable building blocks to build Llama Apps☆7,714Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,251Updated this week
- Democratizing Reinforcement Learning for LLMs☆3,123Updated 2 weeks ago
- ☆2,973Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆13,544Updated this week
- The python library for real-time communication☆3,750Updated this week
- Build Real-Time Knowledge Graphs for AI Agents☆5,218Updated this week
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.☆2,738Updated last month
- Flexible and powerful framework for managing multiple AI agents and handling complex conversations☆4,744Updated last week
- Official Implementation of "KBLaM: Knowledge Base augmented Language Model"☆1,260Updated last week
- NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other ent…☆2,653Updated this week
- Fast, Flexible and Portable Structured Generation☆888Updated 2 weeks ago
- CUDA Python: Performance meets Productivity☆2,372Updated this week
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆2,540Updated 2 weeks ago
- A language model programming library.☆5,749Updated 2 months ago