mlcommons / mlperf_clientLinks

MLPerf Client is a benchmark for Windows, Linux and macOS, focusing on client form factors in ML inference scenarios.

☆59

Alternatives and similar repositories for mlperf_client

Users that are interested in mlperf_client are comparing it to the libraries listed below

Sorting:

onnx / turnkeyml
No-code CLI designed for accelerating ONNX workflows
☆216Updated 5 months ago
intel / intel-ai-assistant-builder
Intel® AI Assistant Builder
☆128Updated this week
huggingface / optimum-amd
AMD related optimizations for transformer models
☆96Updated last month
anthonix / llm.c
LLM training in simple, raw C/HIP for AMD GPUs
☆54Updated last year
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆93Updated this week
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆350Updated last year
google-ai-edge / LiteRT-LM
☆491Updated this week
furiousteabag / vram-calculator
Transformer GPU VRAM estimator
☆67Updated last year
ROCm / aiter
AI Tensor Engine for ROCm
☆306Updated this week
exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆225Updated last year
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆157Updated this week
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆198Updated 2 months ago
tenstorrent / tt-buda-demos
Repository of model demos using TT-Buda
☆63Updated 7 months ago
unslothai / llama.cpp
LLM inference in C/C++
☆103Updated 3 weeks ago
deepreinforce-ai / CUDA-L1
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
☆243Updated 3 weeks ago
openvinotoolkit / openvino.genai
Run Generative AI models with simple C++/Python API and using OpenVINO Runtime
☆374Updated this week
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆92Updated 8 months ago
nomic-ai / llama.cpp
llama.cpp fork used by GPT4All
☆55Updated 9 months ago
tensorwavecloud / ScalarLM
ScalarLM - a unified training and inference stack
☆93Updated last week
bentoml / llm-optimizer
Benchmark and optimize LLM inference across frameworks with ease
☆138Updated 2 months ago
NVIDIA / RTX-AI-Toolkit
The NVIDIA RTX™ AI Toolkit is a suite of tools and SDKs for Windows developers to customize, optimize, and deploy AI models across RTX PC…
☆179Updated this week
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated last week
tenstorrent / tt-smi
Tenstorrent console based hardware information program
☆57Updated this week
AMD-AGI / Instella
Fully Open Language Models with Stellar Performance
☆303Updated 2 weeks ago
likejazz / llama3.cuda
llama3.cuda is a pure C/CUDA implementation for Llama 3 model.
☆348Updated 7 months ago
apple / ml-recurrent-drafter
☆218Updated 10 months ago
andrewkchan / deepseek.cpp
CPU inference for the DeepSeek family of large language models in C++
☆314Updated last month
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆136Updated 5 months ago
sgl-project / sgl-project.github.io
This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.
☆92Updated this week
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆108Updated last year