run-ai / runai-model-streamerLinks
☆275Updated last week
Alternatives and similar repositories for runai-model-streamer
Users that are interested in runai-model-streamer are comparing it to the libraries listed below
Sorting:
- Module, Model, and Tensor Serialization/Deserialization☆283Updated 4 months ago
- CUDA checkpoint and restore utility☆401Updated 3 months ago
- Inference server benchmarking tool☆136Updated 3 months ago
- ☆320Updated last year
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆368Updated last week
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆355Updated this week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆468Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆467Updated 2 weeks ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆799Updated this week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated 3 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆190Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last month
- High-performance safetensors model loader☆92Updated 3 weeks ago
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆138Updated this week
- ☆322Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆398Updated last week
- ☆60Updated last year
- Where GPUs get cooked 👩🍳🔥☆347Updated 3 months ago
- Benchmark suite for LLMs from Fireworks.ai☆84Updated last month
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆325Updated 3 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆247Updated this week
- NVIDIA Inference Xfer Library (NIXL)☆801Updated this week
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆744Updated this week
- NVIDIA NCCL Tests for Distributed Training☆132Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆233Updated this week
- 👷 Build compute kernels☆201Updated this week
- A tool to configure, launch and manage your machine learning experiments.☆213Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 3 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆251Updated this week
- A Lossless Compression Library for AI pipelines☆290Updated 6 months ago