mlcommons / mlperf_clientLinks
MLPerf Client is a benchmark for Windows and macOS, focusing on client form factors in ML inference scenarios.
☆47Updated last month
Alternatives and similar repositories for mlperf_client
Users that are interested in mlperf_client are comparing it to the libraries listed below
Sorting:
- ☆102Updated last year
- A collection of all available inference solutions for the LLMs☆91Updated 6 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆64Updated last year
- llama.cpp fork used by GPT4All☆56Updated 6 months ago
- LLM inference in C/C++☆101Updated last week
- No-code CLI designed for accelerating ONNX workflows☆210Updated 2 months ago
- ☆315Updated this week
- Train, tune, and infer Bamba model☆131Updated 2 months ago
- The NVIDIA RTX™ AI Toolkit is a suite of tools and SDKs for Windows developers to customize, optimize, and deploy AI models across RTX PC…☆170Updated 9 months ago
- ☆54Updated 2 months ago
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆22Updated last year
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated last year
- Measuring Thinking Efficiency in Reasoning Models - Research Repository☆31Updated this week
- Simple examples using Argilla tools to build AI☆55Updated 9 months ago
- Experiments with BitNet inference on CPU☆54Updated last year
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated 11 months ago
- Self-host LLMs with vLLM and BentoML☆140Updated this week
- A minimalistic C++ Jinja templating engine for LLM chat templates☆171Updated 3 weeks ago
- ☆102Updated 2 months ago
- GRadient-INformed MoE☆264Updated 11 months ago
- AMD related optimizations for transformer models☆83Updated last week
- ☆51Updated last year
- ☆238Updated this week
- AirLLM 70B inference with single 4GB GPU☆14Updated 2 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆26Updated 5 months ago
- ☆262Updated 2 months ago
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.☆72Updated this week
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆182Updated 3 weeks ago
- LLM inference in C/C++☆21Updated 5 months ago
- ☆85Updated last week