RobTand/prismaquant

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RobTand/prismaquant)

RobTand / prismaquant

Mixed-precision quantization for LLMs. Every layer refracts into a different format based on its sensitivity. Native compressed-tensors export, validated on Qwen3.6-35B-A3B MoE with MTP speculative decoding.

☆96

Alternatives and similar repositories for prismaquant

Users that are interested in prismaquant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

spark-arena / sparkrun
View on GitHub
sparkrun - launch, manage, and stop LLM inference workloads on NVIDIA DGX Spark systems
☆399Updated this week
albond / DGX_Spark_Qwen3.5-122B-A10B-AR-INT4
View on GitHub
Qwen3.5-122B-A10B on DGX Spark: 28.3 → 51 tok/s (+80%)
☆281Jun 2, 2026Updated last month
Plaaasma / FlashQLA-Blackwell
View on GitHub
FlashQLA TileLang GDN kernels ported to NVIDIA Blackwell consumer (GB10 / DGX Spark)
☆17Jun 5, 2026Updated last month
SeraphimSerapis / tool-eval-bench
View on GitHub
Tool-calling quality benchmark for LLM serving stacks. 80+ deterministic scenarios testing multi-turn orchestration, safety boundaries, a…
☆242Updated this week
spark-arena / recipe-registry
View on GitHub
Official Spark Arena Recipe Registry
☆51Jun 13, 2026Updated last month
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
eugr / spark-vllm-docker
View on GitHub
Docker configuration for running VLLM on dual DGX Sparks
☆1,859Updated this week
0rand / DeepSeek-v4-DSpark-Aidendle94-GB10-ServingStack
View on GitHub
Docker compose serving stack for DeepSeek v4 Flash DSpark for NVIDIA Spark GB10 system using Aidendle94 image
☆18Jul 8, 2026Updated last week
DanTup / spark-evals
View on GitHub
Some benchmark results of small models and quants that fit on DGX Spark
☆48Updated this week
phuongncn / asus-gx10-qwen35-speed-hack
View on GitHub
4-5x faster Qwen3.5 on ASUS GX10 / DGX Spark — Hybrid INT4+FP8 + MTP via one shell script
☆31Apr 16, 2026Updated 3 months ago
eugr / llama-benchy
View on GitHub
llama-benchy - llama-bench style benchmarking tool for all backends
☆580Jul 10, 2026Updated last week
mARTin-B78 / dgx-spark_lite-llm_llama-swap_vllm_llama-cpp_ollama
View on GitHub
LLM Stack for nVidia DGX Spark containing LiteLLM, LamaSwap, vLLM, Llama.cpp and ollama
☆34Jul 6, 2026Updated 2 weeks ago
antheas / spark_hwmon
View on GitHub
Linux hwmon driver for the NVIDIA DGX Spark (GB10 SoC) that exposes full system power telemetry via standard sensors / sysfs interfaces.
☆25Mar 2, 2026Updated 4 months ago
niklasfrick / spark-dashboard
View on GitHub
Real-time hardware and LLM inference monitoring — GPU, CPU, memory, and vLLM metrics streamed to a dashboard.
☆83Jul 9, 2026Updated last week
Entrpi / qwen3.5-122B-A10B-on-spark
View on GitHub
Qwen3.5-122B-A10B on a DGX Spark with DFlash speculative decode. One-shot Docker/vLLM installer. 80+ tok/s!
☆37Jun 29, 2026Updated 3 weeks ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
AEON-7 / vllm-dflash
View on GitHub
DFlash vLLM for DGX Spark — Plug & Play Block-Diffusion Speculative Decoding
☆52Jun 28, 2026Updated 3 weeks ago
joeynyc / spark-doctor
View on GitHub
Local diagnostic CLI for NVIDIA DGX Spark (GB10). Detects power caps, UMA pressure, thermal risk, CUDA 13/SM_121 wheel mismatches, Docker…
☆92Jul 11, 2026Updated last week
calico88x / DGX-Model-Manager
View on GitHub
Single-file web UI for NVIDIA DGX Spark — pull Ollama models, browse and download from HuggingFace, manage LiteLLM routing, and control S…
☆28May 19, 2026Updated 2 months ago
haven-jeon / btop
View on GitHub
A monitor of resources for DGX Spark
☆20Feb 13, 2026Updated 5 months ago
Avarok-Cybersecurity / atlas
View on GitHub
Pure Rust Inference Engine
☆606Updated this week
Entrpi / ds4-on-spark
View on GitHub
Entrpi/ds4, a Blackwell CUDA perf fork of antirez/ds4 on NVIDIA DGX Spark: one-command install, ~2x upstream prefill, ~1.5x decode, DSpar…
☆62Updated this week
whpthomas / spark-auto-round
View on GitHub
☆17Jun 27, 2026Updated 3 weeks ago
flash7777 / vllm-marlin-sm12x
View on GitHub
vLLM fork with Marlin W4A8 SM121 patches + TMA module
☆16Mar 23, 2026Updated 3 months ago
AEON-7 / Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash
View on GitHub
Fully uncensored, capability-enhanced abliteration of Qwen3.6-27B. NVFP4 + z-lab DFlash speculative decoding (n=12) on the unified ghcr.i…
☆426Jul 3, 2026Updated 2 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
christopherowen / spark-vllm-mxfp4-docker
View on GitHub
☆72Feb 27, 2026Updated 4 months ago
timothystewart6 / vllm-gb10
View on GitHub
Bleeding edge vLLM Docker image for the NVIDIA DGX Spark (GB10 / sm_121a).
☆36Updated this week
autoscriptlabs / nccl-mesh-plugin
View on GitHub
☆108Mar 6, 2026Updated 4 months ago
AEON-7 / Qwen3.6-35B-A3B-heretic-NVFP4-DFlash
View on GitHub
Qwen3.6-35B-A3B-heretic NVFP4 + DFlash speculative decoding on DGX Spark (GB10/sm_121a). Source-built vLLM image + 7 patches + comprehens…
☆127Jun 28, 2026Updated 3 weeks ago
parallelArchitect / sparkview
View on GitHub
Operator-grade GPU monitor for NVIDIA GPUs with native GB10 / DGX Spark coherent UMA support — PSI pressure, clock detection, ConnectX-7 …
☆23May 31, 2026Updated last month
lukealonso / b12x
View on GitHub
☆138Updated this week
kreuzhofer / dgx-spark-unsloth-qwen3.5-training
View on GitHub
bf16 LoRA fine-tuning of [Qwen3.5-35B-A3B](https://huggingface.co/unsloth/Qwen3.5-35B-A3B) (a 35B-total / 3B-active Mixture-of-Experts vi…
☆15Mar 12, 2026Updated 4 months ago
mARTin-B78 / dgx-spark-faster-qwen3-tts
View on GitHub
Run Faster-Qwen3-TTS on NVIDIA DGX Spark GB10 (ARM64/SM121/CUDA13) - OpenAI-compatible TTS API with CUDA graph acceleration
☆15Jun 26, 2026Updated 3 weeks ago
mark-ramsey-ri / vllm-dgx-spark
View on GitHub
Run vLLM on 1-to-N NVIDIA DGX Spark servers (single Spark, 2 via direct cable, or 3+ via switched fabric) to serve or benchmark LLMs
☆122Jun 22, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
allenporter / home-assistant-synthetic-home
View on GitHub
A Home Assistant custom component used for generating a synthetic home
☆19Updated this week
CG-8663 / turboquant-tinygrad-bridge
View on GitHub
Compressed KV cache as cross-backend wire format for Metal + CUDA split inference over Thunderbolt 5
☆16Apr 14, 2026Updated 3 months ago
adadrag / qwen3.5-dgx-spark
View on GitHub
Complete guide to running Qwen3.5-35B-A3B on NVIDIA DGX Spark (GB10) with vLLM - installation, benchmarks, vision features, and troublesh…
☆96Mar 11, 2026Updated 4 months ago
violetxi / ExpRL
View on GitHub
☆19Jun 16, 2026Updated last month
elementalcollision / GraphMemory-IDE
View on GitHub
AI-assisted development MCP providing long-term, on-device "AI memory" for IDEs. Powered by Kuzu GraphDB and exposed via MCP server
☆15Jun 21, 2026Updated last month
lna-lab / blackwell-geforce-nvfp4-gemm
View on GitHub
NVFP4 inference on Blackwell GeForce (RTX 5090/5080/5070 Ti/RTX PRO 6000) — SM120 patches for vLLM + FlashInfer + CUTLASS. 175 tok/s on Q…
☆21Apr 27, 2026Updated 2 months ago
NVIDIA / dgx-spark-playbooks
View on GitHub
Collection of step-by-step playbooks for setting up AI/ML workloads on NVIDIA DGX Spark devices with Blackwell architecture.
☆1,158Updated this week