Back to Blog
October 2025

Metrum AI Adds Support for NVIDIA DGX Spark in Metrum Insights

Metrum AI is announcing support for NVIDIA DGX Spark in Metrum Insights, a platform for benchmarking model, pipeline performance, and accuracy. This integration makes it easier to measure end-to-end throughput, latency, cost efficiency, and evaluation quality across modern workloads.

"Early results on NVIDIA DGX Spark show strong standalone performance, even against server-class GPUs, translating into lower Tokens-Per-$ on state-of-the-art SLMs. Combined with Metrum Insights' automated evaluations and performance analytics, customers can make smarter deployment decisions, faster."
— Chetan Gadgil, CTO, Metrum AI

Metrum Insights and NVIDIA DGX Spark

  • Development to production: Metrum Insights integrates with DGX Spark to provide a single environment for benchmarking and evaluation, including fully self-contained on-premises deployments.

  • Higher throughput, lower cost: Powered by the NVIDIA GB10 Grace Blackwell Superchip, DGX Spark increases tokens per second and reduces cost per token with built-in cost modeling.

  • Comparable results: Standardized harnesses for NVIDIA NIM, vLLM, SGLang, LoRA fine-tuning, and RAG pipelines enable apples-to-apples metrics across hardware, models, and configurations.

  • Full-stack visibility: Track utilization, power, and memory alongside model-quality metrics in one place.

  • LLM/VLM workflows: Evaluate inference across varying concurrency and input/output token lengths to validate performance and accuracy on leading models.

How It Works

  • Target selection: Choose DGX Spark in project settings. Metrum Insights auto-detects GPU topology, mixed-precision features, and recommended runtime flags.

  • Standard evaluations: Run suites covering latency, throughput, Tokens-Per-Watt, accuracy, and power with flexible benchmark configurations and industry-standard datasets.

  • Dashboards: Monitor Tokens-Per-Watt, p95 latency, and accuracy deltas in unified dashboards; export reports for procurement and capacity planning.

Early Results

  • Compact performance: Portable yet powerful, delivering near server-class performance in a compact design

  • Accuracy maintained: Mixed precision plus tuned quantization settings.

  • Lower cost per token: Improved Tokens-Per-$ for AI Agents, RAG, and fine-tuning, enabling larger context windows and faster responses at the same spend, deployable without special facility requirements.

Getting Started

  • Existing users: DGX Spark is now available as a hardware target in the latest Metrum Insights release.

  • New to Metrum? Start a trial project, choose an LLM, and run your first DGX Spark benchmark in minutes.

Metrum Insights provides actionable insights to help teams ship faster and more cost-effective systems. With NVIDIA DGX Spark support in Metrum Insights, you can quantify gains and put them into production.

Contact: contact@metrum.ai