eugr/llama-benchy

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/eugr/llama-benchy)

eugr / llama-benchy

llama-benchy - llama-bench style benchmarking tool for all backends

☆618

Alternatives and similar repositories for llama-benchy

Users that are interested in llama-benchy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

eugr / spark-vllm-docker
View on GitHub
Docker configuration for running VLLM on dual DGX Sparks
☆1,956Updated this week
spark-arena / sparkrun
View on GitHub
sparkrun - launch, manage, and stop LLM inference workloads on NVIDIA DGX Spark systems
☆423Updated this week
SeraphimSerapis / tool-eval-bench
View on GitHub
Tool-calling quality benchmark for LLM serving stacks. 80+ deterministic scenarios testing multi-turn orchestration, safety boundaries, a…
☆280Updated this week
RobTand / prismaquant
View on GitHub
Mixed-precision quantization for LLMs. Every layer refracts into a different format based on its sensitivity. Native compressed-tensors e…
☆100Updated this week
spark-arena / recipe-registry
View on GitHub
Official Spark Arena Recipe Registry
☆54Jun 13, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
NVIDIA / dgx-spark-playbooks
View on GitHub
Collection of step-by-step playbooks for setting up AI/ML workloads on NVIDIA DGX Spark devices with Blackwell architecture.
☆1,219Jul 29, 2026Updated last week
albond / DGX_Spark_Qwen3.5-122B-A10B-AR-INT4
View on GitHub
Qwen3.5-122B-A10B on DGX Spark: 28.3 → 51 tok/s (+80%)
☆306Jun 2, 2026Updated 2 months ago
Avarok-Cybersecurity / dgx-vllm
View on GitHub
A dedicated effort to make an optimized, bleeding edge vLLM image using Docker to support DGX comprehensively
☆123Feb 22, 2026Updated 5 months ago
mostlygeek / llama-swap
View on GitHub
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
☆5,281Updated this week
raphaelamorim / spark-playbooks
View on GitHub
NVIDIA DGX Spark Playbooks
☆18Nov 26, 2025Updated 8 months ago
0rand / DeepSeek-v4-DSpark-Aidendle94-GB10-ServingStack
View on GitHub
Docker compose serving stack for DeepSeek v4 Flash DSpark for NVIDIA Spark GB10 system using Aidendle94 image
☆21Jul 8, 2026Updated 3 weeks ago
phuongncn / asus-gx10-qwen35-speed-hack
View on GitHub
4-5x faster Qwen3.5 on ASUS GX10 / DGX Spark — Hybrid INT4+FP8 + MTP via one shell script
☆32Apr 16, 2026Updated 3 months ago
niklasfrick / spark-dashboard
View on GitHub
Real-time hardware and LLM inference monitoring — GPU, CPU, memory, and vLLM metrics streamed to a dashboard.
☆86Jul 28, 2026Updated last week
christopherowen / spark-vllm-mxfp4-docker
View on GitHub
☆72Feb 27, 2026Updated 5 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
DanTup / spark-evals
View on GitHub
Some benchmark results of small models and quants that fit on DGX Spark
☆49Updated this week
AEON-7 / vllm-dflash
View on GitHub
DFlash vLLM for DGX Spark — Plug & Play Block-Diffusion Speculative Decoding
☆54Jun 28, 2026Updated last month
ikawrakow / ik_llama.cpp
View on GitHub
llama.cpp fork with additional SOTA quants and improved performance
☆3,007Updated this week
tonyd2wild / DeepSeek-v4-Flash-0731-DSpark-1M-NVFP4-KV-2x-DGX-Spark
View on GitHub
DeepSeek V4 Flash DSpark 1M NVFP4 KV recipe for 2x DGX Spark
☆268Updated this week
dorangao / dgx-spark-toolkit
View on GitHub
☆16Jan 15, 2026Updated 6 months ago
Plaaasma / FlashQLA-Blackwell
View on GitHub
FlashQLA TileLang GDN kernels ported to NVIDIA Blackwell consumer (GB10 / DGX Spark)
☆17Jun 5, 2026Updated 2 months ago
Avarok-Cybersecurity / atlas
View on GitHub
Pure Rust Inference Engine
☆629Updated this week
DasDigitaleMomentum / strix-halo-cuda-combined-toolbox
View on GitHub
☆17Jul 21, 2026Updated 2 weeks ago
stevibe / BenchLocal
View on GitHub
Test LLMs on real tasks. Compare models side-by-side.
☆393Jun 16, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
alexziskind1 / codeneedle
View on GitHub
☆303May 18, 2026Updated 2 months ago
Luce-Org / lucebox
View on GitHub
LLM speculative inference server for consumer hardware & heterogeneous computing
☆2,717Updated this week
antheas / spark_hwmon
View on GitHub
Linux hwmon driver for the NVIDIA DGX Spark (GB10 SoC) that exposes full system power telemetry via standard sensors / sysfs interfaces.
☆29Mar 2, 2026Updated 5 months ago
kyuz0 / amd-strix-halo-vllm-toolboxes
View on GitHub
☆486Updated this week
MiaAI-Lab / DeepSeek-V4-Flash-Dual-DGX-Spark-1M-Context
View on GitHub
Deploy DeepSeek V4 Flash (MoE reasoning model) on dual DGX Spark nodes with 1M token context, InfiniBand, and FP8 KV-cache
☆92Jul 9, 2026Updated 3 weeks ago
local-inference-lab / rtx6kpro
View on GitHub
RTX 6000 Pro Wiki — Running Large LLMs (Qwen3.5-397B, Kimi-K2.5, GLM-5) on PCIe GPUs without NVLink
☆782Updated this week
MiaAI-Lab / DeepSeek-v4-Flash-DSpark-2x-DGX-Spark
View on GitHub
DeepSeek-v4-Flash 0731 recipe for 2x DGX Sparks
☆416Updated this week
Goekdeniz-Guelmez / MLX-Benchmark
View on GitHub
The best benchmark for LLMs on Apple's MLX framework knowledge and coding tasks.
☆37Jun 12, 2026Updated last month
kyuz0 / amd-strix-halo-toolboxes
View on GitHub
☆1,811Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
intel / llm-scaler
View on GitHub
☆453Updated this week
AEON-7 / Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash
View on GitHub
Fully uncensored, capability-enhanced abliteration of Qwen3.6-27B. NVFP4 + z-lab DFlash speculative decoding (n=12) on the unified ghcr.i…
☆435Jul 3, 2026Updated last month
lukaLLM / DFlash_Qwen3.6_27B_LlamaCPP
View on GitHub
☆17Jul 16, 2026Updated 2 weeks ago
ateska / dgx-spark-prometheus
View on GitHub
A Prometheus metrics exporter for NVIDIA DGX Spark clusters.
☆19Feb 16, 2026Updated 5 months ago
alexziskind1 / llama-throughput-lab
View on GitHub
Interactive launcher and benchmarking harness for llama.cpp server throughput, with tests, sweeps, and round‑robin load tools.
☆443Feb 8, 2026Updated 5 months ago
namake-taro / vllm-custom
View on GitHub
☆20Apr 7, 2026Updated 3 months ago
kyuz0 / amd-strix-halo-llm-finetuning
View on GitHub
☆98Mar 8, 2026Updated 4 months ago