run-ai / runai-model-streamerLinks
☆214Updated this week
Alternatives and similar repositories for runai-model-streamer
Users that are interested in runai-model-streamer are comparing it to the libraries listed below
Sorting:
- Module, Model, and Tensor Serialization/Deserialization☆232Updated last week
- CUDA checkpoint and restore utility☆339Updated 4 months ago
- Inference server benchmarking tool☆64Updated last month
- ☆260Updated 2 weeks ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆308Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 3 weeks ago
- ☆34Updated last week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆127Updated last month
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆169Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆207Updated 10 months ago
- ☆308Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 7 months ago
- Benchmark suite for LLMs from Fireworks.ai☆75Updated 2 weeks ago
- High-performance safetensors model loader☆34Updated last week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated 10 months ago
- PyTorch per step fault tolerance (actively under development)☆302Updated this week
- ☆49Updated 2 months ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆62Updated 2 weeks ago
- SGLang is fast serving framework for large language models and vision language models.☆23Updated 3 months ago
- Google TPU optimizations for transformers models☆112Updated 4 months ago
- Perplexity GPU Kernels☆318Updated last week
- Repository for open inference protocol specification☆56Updated 2 weeks ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆301Updated this week
- vLLM adapter for a TGIS-compatible gRPC server.☆30Updated this week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆67Updated last year
- Where GPUs get cooked 👩🍳🔥☆229Updated 2 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆362Updated this week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆590Updated this week
- A collection of all available inference solutions for the LLMs☆88Updated 3 months ago
- xet client tech, used in huggingface_hub☆103Updated last week