mlcommons / mlperf_clientLinks
MLPerf Client is a benchmark for Windows, Linux and macOS, focusing on client form factors in ML inference scenarios.
☆73Updated 2 months ago
Alternatives and similar repositories for mlperf_client
Users that are interested in mlperf_client are comparing it to the libraries listed below
Sorting:
- No-code CLI designed for accelerating ONNX workflows☆227Updated 7 months ago
- Transformer GPU VRAM estimator☆68Updated last year
- Train, tune, and infer Bamba model☆138Updated 8 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week
- AMD related optimizations for transformer models☆97Updated 3 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆203Updated 4 months ago
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆40Updated 6 months ago
- Fully Open Language Models with Stellar Performance☆318Updated 2 months ago
- LLM inference in C/C++☆104Updated last week
- A collection of all available inference solutions for the LLMs☆94Updated 11 months ago
- Intel® AI Super Builder☆156Updated last week
- Building blocks for agents in C++☆139Updated last week
- Route LLM requests to the best model for the task at hand.☆173Updated 3 weeks ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆590Updated 3 weeks ago
- Benchmark and optimize LLM inference across frameworks with ease☆161Updated 4 months ago
- Self-host LLMs with vLLM and BentoML☆168Updated 2 weeks ago
- CPU inference for the DeepSeek family of large language models in C++☆317Updated 4 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆68Updated this week
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆287Updated 3 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆352Updated last year
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆74Updated last year
- ☆118Updated last month
- Code for paper "Analog Foundation Models"☆30Updated 4 months ago
- ☆219Updated last year
- Inference code for LLaMA models☆41Updated 2 years ago
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang☆100Updated last week
- We aim to redefine Data Parallel libraries portabiliy, performance, programability and maintainability, by using C++ standard features, i…☆47Updated this week
- Benchmarking the serving capabilities of vLLM☆59Updated last year
- Efficient non-uniform quantization with GPTQ for GGUF☆58Updated 4 months ago