vast-data / VUALinks
VUA stands for 'VAST Undivided Attention'. It's a global KVCache storage solution optimizing LLM time to first token (TTFT) and GPU utilization.
☆14Updated this week
Alternatives and similar repositories for VUA
Users that are interested in VUA are comparing it to the libraries listed below
Sorting:
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆375Updated this week
- ☆221Updated this week
- A toolkit for discovering cluster network topology.☆54Updated last week
- NVIDIA Inference Xfer Library (NIXL)☆413Updated this week
- KV cache store for distributed LLM inference☆269Updated 2 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆97Updated this week
- Serverless LLM Serving for Everyone.☆488Updated this week
- Systematic and comprehensive benchmarks for LLM systems.☆17Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆114Updated this week
- Helm charts for llm-d☆42Updated this week
- High-performance safetensors model loader☆39Updated 2 weeks ago
- Efficient and easy multi-instance LLM serving☆437Updated this week
- CUDA checkpoint and restore utility☆345Updated 4 months ago
- A tool to detect infrastructure issues on cloud native AI systems☆39Updated last month
- ☆62Updated 4 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆129Updated last month
- The driver for LMCache core to run in vLLM☆41Updated 4 months ago
- All-in-Storage Solution based on DiskANN for DRAM-free Approximate Nearest Neighbor Search☆57Updated 4 months ago
- An I/O benchmark for deep Learning applications☆87Updated this week
- An Open Source, Cloud-native AI Infrastructure Platform. Not Just GPUs.☆42Updated 3 weeks ago
- Inference scheduler for llm-d☆56Updated this week
- ☆310Updated 10 months ago
- Run Slurm on Kubernetes. A Slinky project.☆119Updated 2 weeks ago
- ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!☆204Updated this week
- ☆47Updated 11 months ago
- ☆36Updated this week
- ScalarLM - a unified training and inference stack☆39Updated last month
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆98Updated 2 months ago
- Route LLM requests to the best model for the task at hand.☆66Updated this week
- Perplexity GPU Kernels☆364Updated last week