fw-ai / benchmarkLinks

Benchmark suite for LLMs from Fireworks.ai

☆82

Alternatives and similar repositories for benchmark

Users that are interested in benchmark are comparing it to the libraries listed below

Sorting:

bentoml / llm-bench
☆56Updated 11 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆283Updated this week
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆60Updated this week
IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆61Updated last month
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆125Updated 2 years ago
mlc-ai / llm-perf-bench
☆120Updated last year
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆34Updated 6 months ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆202Updated 3 months ago
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆79Updated 7 months ago
run-ai / llmperf
☆58Updated last year
snowflakedb / ArcticTraining
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
☆227Updated last week
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆142Updated 8 months ago
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆215Updated last year
neuralmagic / AutoFP8
☆205Updated 5 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated last month
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆90Updated this week
hamelsmu / llama-inference
experiments with inference on llama
☆103Updated last year
LLM-inference-router / vllm-router
vLLM Router
☆44Updated last year
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆252Updated this week
apple / ml-recurrent-drafter
☆218Updated 9 months ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆130Updated 10 months ago
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆231Updated 10 months ago
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆92Updated last year
vllm-project / dashboard
vLLM performance dashboard
☆37Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
Snowflake-Labs / vllm
☆15Updated last month
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆220Updated this week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆318Updated 3 weeks ago