qoofyk / LLM_Sizing_GuideLinks

A calculator to estimate the memory footprint, capacity, and latency on VMware Private AI with NVIDIA.

☆37

Alternatives and similar repositories for LLM_Sizing_Guide

Users that are interested in LLM_Sizing_Guide are comparing it to the libraries listed below

Sorting:

bentoml / llm-bench
☆56Updated last year
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆91Updated 8 months ago
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆79Updated last year
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated last month
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated last year
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆83Updated this week
triton-inference-server / vllm_backend
☆312Updated this week
LLM-inference-router / vllm-router
vLLM Router
☆50Updated last year
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆70Updated this week
run-ai / llmperf
☆57Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆158Updated 3 weeks ago
huggingface / inference-benchmarker
Inference server benchmarking tool
☆128Updated last month
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆235Updated 11 months ago
triton-inference-server / perf_analyzer
☆118Updated this week
vllm-project / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆708Updated this week
substratusai / vllm-docker
☆64Updated 7 months ago
run-ai / runai-model-streamer
☆267Updated this week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆317Updated last month
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆139Updated last year
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆58Updated 9 months ago
simon-mo / vLLM-Benchmark
☆31Updated 7 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆300Updated this week
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆202Updated 4 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆93Updated this week
backprop-ai / vllm-benchmark
Benchmarking the serving capabilities of vLLM
☆56Updated last year
huggingface / tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
☆34Updated 7 months ago
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆217Updated last year
mlc-ai / llm-perf-bench
☆120Updated last year
nyunAI / PruneGPT
☆51Updated last year