huggingface / inference-benchmarkerLinks
Inference server benchmarking tool
☆87Updated 3 months ago
Alternatives and similar repositories for inference-benchmarker
Users that are interested in inference-benchmarker are comparing it to the libraries listed below
Sorting:
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆203Updated this week
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆190Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆265Updated 9 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆307Updated 2 months ago
- Benchmark suite for LLMs from Fireworks.ai☆76Updated this week
- Comparison of Language Model Inference Engines☆222Updated 7 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆268Updated last year
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆461Updated this week
- ☆231Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 11 months ago
- Where GPUs get cooked 👩🍳🔥☆266Updated this week
- Easy and Efficient Quantization for Transformers☆198Updated last month
- OpenAI compatible API for TensorRT LLM triton backend☆209Updated last year
- 👷 Build compute kernels☆87Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆141Updated last week
- A collection of all available inference solutions for the LLMs☆91Updated 5 months ago
- Load compute kernels from the Hub☆220Updated this week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated last year
- Google TPU optimizations for transformers models☆117Updated 6 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆323Updated 3 months ago
- ☆280Updated this week
- ☆215Updated 6 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆217Updated this week
- ☆128Updated 3 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆258Updated last week
- ☆195Updated 3 months ago
- Efficient LLM Inference over Long Sequences☆385Updated last month
- vLLM performance dashboard☆33Updated last year
- LLM KV cache compression made easy☆566Updated this week
- ☆55Updated 8 months ago