huggingface / inference-benchmarkerLinks
Inference server benchmarking tool
β136Updated 3 months ago
Alternatives and similar repositories for inference-benchmarker
Users that are interested in inference-benchmarker are comparing it to the libraries listed below
Sorting:
- β274Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β326Updated 3 months ago
- Comparison of Language Model Inference Enginesβ239Updated last year
- ArcticInference: vLLM plugin for high-throughput, low-latency inferenceβ368Updated last week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLMβ190Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated last month
- OpenAI compatible API for TensorRT LLM triton backendβ219Updated last year
- π· Build compute kernelsβ201Updated last week
- Where GPUs get cooked π©βπ³π₯β347Updated 4 months ago
- Benchmark suite for LLMs from Fireworks.aiβ84Updated last month
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ799Updated this week
- An innovative library for efficient LLM inference via low-bit quantizationβ351Updated last year
- β324Updated this week
- β218Updated 11 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ233Updated this week
- β56Updated last year
- π―An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantizaβ¦β806Updated last week
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β269Updated this week
- Efficient LLM Inference over Long Sequencesβ393Updated 6 months ago
- Load compute kernels from the Hubβ359Updated last week
- Utils for Unsloth https://github.com/unslothai/unslothβ186Updated last week
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β138Updated last year
- A collection of all available inference solutions for the LLMsβ94Updated 10 months ago
- Self-host LLMs with vLLM and BentoMLβ163Updated last month
- Common recipes to run vLLMβ335Updated last week
- β60Updated last year
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) servβ¦β251Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inferenceβ62Updated 4 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ278Updated last year
- Accelerating your LLM training to full speed! Made with β€οΈ by ServiceNow Researchβ277Updated this week