run-ai / runai-model-streamerLinks

☆257

Alternatives and similar repositories for runai-model-streamer

Users that are interested in runai-model-streamer are comparing it to the libraries listed below

Sorting:

coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆270Updated 2 months ago
imbue-ai / cluster-health
☆316Updated last year
NVIDIA / cuda-checkpoint
CUDA checkpoint and restore utility
☆376Updated last month
sgl-project / ome
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆292Updated this week
huggingface / inference-benchmarker
Inference server benchmarking tool
☆118Updated 3 weeks ago
vllm-project / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆655Updated this week
meta-pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆420Updated this week
leptonai / gpud
GPUd automates monitoring, diagnostics, and issue identification for GPUs
☆440Updated this week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
run-ai / genv
GPU environment and cluster management with LLM support
☆646Updated last year
zipnn / zipnn
A Lossless Compression Library for AI pipelines
☆282Updated 3 months ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆82Updated last week
AI-Hypercomputer / JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…
☆384Updated 4 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆283Updated this week
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated last month
foundation-model-stack / fastsafetensors
High-performance safetensors model loader
☆67Updated this week
triton-inference-server / vllm_backend
☆302Updated this week
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆91Updated 7 months ago
vllm-project / recipes
Common recipes to run vLLM
☆172Updated this week
huggingface / gpu-fryer
Where GPUs get cooked 👩‍🍳🔥
☆293Updated last month
bentoml / llm-bench
☆56Updated 11 months ago
NVIDIA-NeMo / Run
A tool to configure, launch and manage your machine learning experiments.
☆200Updated last week
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆73Updated last year
run-ai / llmperf
☆58Updated last year
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆120Updated 9 months ago
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆60Updated this week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆318Updated 3 weeks ago
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆215Updated last year
apple / ml-recurrent-drafter
☆218Updated 9 months ago
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆231Updated 10 months ago