mlcommons / mlperf_clientLinks
MLPerf Client is a benchmark for Windows and macOS, focusing on client form factors in ML inference scenarios.
☆51Updated last month
Alternatives and similar repositories for mlperf_client
Users that are interested in mlperf_client are comparing it to the libraries listed below
Sorting:
- LLM inference in C/C++☆101Updated last month
- GPT-4 Level Conversational QA Trained In a Few Hours☆64Updated last year
- Intel® AI Assistant Builder☆106Updated this week
- Transformer GPU VRAM estimator☆66Updated last year
- No-code CLI designed for accelerating ONNX workflows☆214Updated 3 months ago
- ☆102Updated last year
- ☆338Updated this week
- AMD related optimizations for transformer models☆88Updated last month
- A collection of all available inference solutions for the LLMs☆91Updated 6 months ago
- GPTQ and efficient search for GGUF☆48Updated last week
- ☆57Updated 3 months ago
- LLM Inference on consumer devices☆124Updated 6 months ago
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.☆78Updated this week
- ☆97Updated last month
- 1.58 Bit LLM on Apple Silicon using MLX☆223Updated last year
- A command-line interface tool for serving LLM using vLLM.☆414Updated last month
- ☆104Updated 3 months ago
- ☆17Updated 9 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated last year
- Self-host LLMs with vLLM and BentoML☆150Updated 2 weeks ago
- ☆57Updated 4 months ago
- llama.cpp fork used by GPT4All☆56Updated 7 months ago
- Train, tune, and infer Bamba model☆132Updated 3 months ago
- Distributed Inference for mlx LLm☆95Updated last year
- 1.58-bit LLaMa model☆82Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week
- ☆62Updated 2 months ago
- Simple examples using Argilla tools to build AI☆55Updated 10 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆181Updated 2 weeks ago
- CPU inference for the DeepSeek family of large language models in C++☆314Updated 3 months ago