run-ai / runai-model-streamer
☆190Updated last week
Alternatives and similar repositories for runai-model-streamer:
Users that are interested in runai-model-streamer are comparing it to the libraries listed below
- Module, Model, and Tensor Serialization/Deserialization☆221Updated last month
- CUDA checkpoint and restore utility☆322Updated 2 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆248Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆70Updated 2 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆330Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 6 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆124Updated last week
- Inference server benchmarking tool☆48Updated last week
- Perplexity GPU Kernels☆185Updated this week
- PyTorch per step fault tolerance (actively under development)☆273Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 3 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆136Updated last week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆65Updated last year
- OpenAI compatible API for TensorRT LLM triton backend☆204Updated 8 months ago
- Google TPU optimizations for transformers models☆107Updated 2 months ago
- ☆205Updated 2 months ago
- Self-host LLMs with vLLM and BentoML☆100Updated this week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆450Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆313Updated this week
- ☆301Updated 7 months ago
- ☆30Updated 2 weeks ago
- ☆241Updated this week
- vLLM adapter for a TGIS-compatible gRPC server.☆25Updated this week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆135Updated 8 months ago
- A tool to configure, launch and manage your machine learning experiments.☆135Updated this week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆238Updated this week
- Where GPUs get cooked 👩🍳🔥☆221Updated last month
- LLM KV cache compression made easy☆452Updated 3 weeks ago
- A top-like tool for monitoring GPUs in a cluster☆86Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆98Updated this week