mlcommons / mlperf_clientLinks
MLPerf Client is a benchmark for Windows and macOS, focusing on client form factors in ML inference scenarios.
☆55Updated last month
Alternatives and similar repositories for mlperf_client
Users that are interested in mlperf_client are comparing it to the libraries listed below
Sorting:
- No-code CLI designed for accelerating ONNX workflows☆216Updated 4 months ago
- Intel® AI Assistant Builder☆117Updated last week
- AMD related optimizations for transformer models☆94Updated 3 weeks ago
- LLM inference in C/C++☆103Updated last week
- llama.cpp fork used by GPT4All☆57Updated 8 months ago
- ☆456Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆92Updated last week
- Transformer GPU VRAM estimator☆67Updated last year
- CPU inference for the DeepSeek family of large language models in C++☆315Updated last month
- 1.58 Bit LLM on Apple Silicon using MLX☆225Updated last year
- A command-line interface tool for serving LLM using vLLM.☆440Updated 3 weeks ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- LLM training in simple, raw C/HIP for AMD GPUs☆53Updated last year
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.☆88Updated this week
- Self-host LLMs with vLLM and BentoML☆154Updated last week
- The NVIDIA RTX™ AI Toolkit is a suite of tools and SDKs for Windows developers to customize, optimize, and deploy AI models across RTX PC…☆176Updated 11 months ago
- ☆105Updated last week
- ☆218Updated 9 months ago
- A collection of all available inference solutions for the LLMs☆91Updated 8 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆368Updated this week
- A minimalistic C++ Jinja templating engine for LLM chat templates☆195Updated last month
- ☆102Updated last year
- GPT-4 Level Conversational QA Trained In a Few Hours☆65Updated last year
- Utils for Unsloth https://github.com/unslothai/unsloth☆165Updated this week
- Route LLM requests to the best model for the task at hand.☆122Updated last week
- ☆60Updated 4 months ago
- ☆264Updated this week
- LLM Inference on consumer devices☆125Updated 7 months ago
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Updated last year