run-ai / runai-model-streamerLinks
☆221Updated this week
Alternatives and similar repositories for runai-model-streamer
Users that are interested in runai-model-streamer are comparing it to the libraries listed below
Sorting:
- Module, Model, and Tensor Serialization/Deserialization☆240Updated last week
- Inference server benchmarking tool☆73Updated last month
- CUDA checkpoint and restore utility☆345Updated 4 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆129Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆264Updated 8 months ago
- PyTorch per step fault tolerance (actively under development)☆304Updated last week
- ☆264Updated last week
- Where GPUs get cooked 👩🍳🔥☆234Updated 3 months ago
- Benchmark suite for LLMs from Fireworks.ai☆76Updated 2 weeks ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆375Updated this week
- ☆55Updated 9 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆177Updated 2 weeks ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆348Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆114Updated this week
- High-performance safetensors model loader☆39Updated last week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆67Updated last year
- ☆36Updated this week
- ☆310Updated 10 months ago
- ☆49Updated 3 months ago
- ☆155Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated last month
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆253Updated this week
- NVIDIA Inference Xfer Library (NIXL)☆413Updated this week
- A top-like tool for monitoring GPUs in a cluster☆85Updated last year
- OpenAI compatible API for TensorRT LLM triton backend☆209Updated 10 months ago
- Load compute kernels from the Hub☆172Updated this week
- NVIDIA NCCL Tests for Distributed Training☆97Updated this week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆656Updated this week
- Controller for ModelMesh☆232Updated last week
- ☆194Updated last month