qoofyk / LLM_Sizing_GuideLinks
A calculator to estimate the memory footprint, capacity, and latency on VMware Private AI with NVIDIA.
☆37Updated 3 months ago
Alternatives and similar repositories for LLM_Sizing_Guide
Users that are interested in LLM_Sizing_Guide are comparing it to the libraries listed below
Sorting:
- ☆56Updated last year
- A collection of all available inference solutions for the LLMs☆91Updated 8 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆79Updated last year
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated last month
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- Benchmark suite for LLMs from Fireworks.ai☆83Updated this week
- ☆312Updated this week
- vLLM Router☆50Updated last year
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆70Updated this week
- ☆57Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year
- Self-host LLMs with vLLM and BentoML☆158Updated 3 weeks ago
- Inference server benchmarking tool☆128Updated last month
- Comparison of Language Model Inference Engines☆235Updated 11 months ago
- ☆118Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆708Updated this week
- ☆64Updated 7 months ago
- ☆267Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆317Updated last month
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆139Updated last year
- The driver for LMCache core to run in vLLM☆58Updated 9 months ago
- ☆31Updated 7 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆300Updated this week
- Easy and Efficient Quantization for Transformers☆202Updated 4 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week
- Benchmarking the serving capabilities of vLLM☆56Updated last year
- Large Language Model Text Generation Inference on Habana Gaudi☆34Updated 7 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆217Updated last year
- ☆120Updated last year
- ☆51Updated last year