Metrum AI Adds Support for NVIDIA DGX Spark in Metrum Insights

Metrum AI announces NVIDIA DGX Spark support in Metrum Insights platform

Metrum AI is announcing support for NVIDIA DGX Spark in Metrum Insights, a platform for benchmarking model, pipeline performance, and accuracy. This integration makes it easier to measure end-to-end throughput, latency, cost efficiency, and evaluation quality across modern workloads.

"Early results on NVIDIA DGX Spark show strong standalone performance, even against server-class GPUs, translating into lower Tokens-Per-$ on state-of-the-art SLMs. Combined with Metrum Insights' automated evaluations and performance analytics, customers can make smarter deployment decisions, faster."
- Chetan Gadgil, CTO, Metrum AI

Metrum Insights and NVIDIA DGX Spark

Metrum Insights dashboard integrated with NVIDIA DGX Spark for benchmarking

Development to production: Metrum Insights integrates with DGX Spark to provide a single environment for benchmarking and evaluation, including fully self-contained on-premises deployments.
Higher throughput, lower cost: Powered by the NVIDIA GB10 Grace Blackwell Superchip, DGX Spark increases tokens per second and reduces cost per token with built-in cost modeling.
Comparable results: Standardized harnesses for NVIDIA NIM, vLLM, SGLang, LoRA fine-tuning, and RAG pipelines enable apples-to-apples metrics across hardware, models, and configurations.
Full-stack visibility: Track utilization, power, and memory alongside model-quality metrics in one place.
LLM/VLM workflows: Evaluate inference across varying concurrency and input/output token lengths to validate performance and accuracy on leading models.

How It Works

Target selection: Choose DGX Spark in project settings. Metrum Insights auto-detects GPU topology, mixed-precision features, and recommended runtime flags.
Standard evaluations: Run suites covering latency, throughput, Tokens-Per-Watt, accuracy, and power with flexible benchmark configurations and industry-standard datasets.
Dashboards: Monitor Tokens-Per-Watt, p95 latency, and accuracy deltas in unified dashboards; export reports for procurement and capacity planning.

Early Results

Compact performance: Portable yet powerful, delivering near server-class performance in a compact design
Accuracy maintained: Mixed precision plus tuned quantization settings.
Lower cost per token: Improved Tokens-Per-$ for AI Agents, RAG, and fine-tuning, enabling larger context windows and faster responses at the same spend, deployable without special facility requirements.

Getting Started

Existing users: DGX Spark is now available as a hardware target in the latest Metrum Insights release.
New to Metrum? Start a trial project, choose an LLM, and run your first DGX Spark benchmark in minutes.

Metrum Insights provides actionable insights to help teams ship faster and more cost-effective systems. With NVIDIA DGX Spark support in Metrum Insights, you can quantify gains and put them into production.

Contact: contact@metrum.ai