
| Introduction
Metrum Insights v3.7 introduces major upgrades across benchmarking workflows, system orchestration, model-serving infrastructure, and the user support agent, making it easier than ever to compare systems, generate performance insights, and run complex tests at scale.
Below is a look at what’s new in this release:
Versus Workspace – Side-by-Side Benchmarking
With v3.7, teams can directly compare two systems in real time through the new Versus Workspace—designed for hardware evaluations and competitive analysis.
Key capabilities:
- Dual-system, side-by-side benchmarking for instant comparison.
- Real-time charts for GPU/CPU usage, throughput, latency, and telemetry metrics.
- Supports streaming LLM inference, audio transcription, and image-to-text workloads.
- Compatible with both CPUs and NVIDIA GPUs, enabling cross-hardware comparison.
This workspace eliminates manual stitching of comparative performance data, allowing performance engineers to validate hardware differences instantly.
User Support Agent Upgrades
The user support agent now enables deeper automation and more intelligent support across the benchmarking workflow.
Chat-Based Project Creation & Execution:
- Create benchmark projects directly through chat.
- Schedule runs, modify parameters, and launch jobs, all via simple natural language prompts.
- Track job progress, status, and completion updates without leaving the chat interface.
Smarter Automation & Integrations:
- Email notifications on job success or failure.
- JIRA integration for one-click ticket creation and automated issue logging.
- Generates post-run analysis, summaries, and debugging insights.
- Expanded internal knowledge base with improved RAG search.
- Built-in web search for latest model, framework, and optimization updates.
These improvements simplify the project management workflow for users looking to generate several projects without manually stepping into project creation and execution, as well as the monitoring workflow, providing users with a snapshot of benchmarking run status through a simple chat interface.
Bring Your Own Endpoint (BYOE)
This feature enables users to bring their own custom inference endpoints into Metrum Insights, where they can evaluate their endpoint performance on real-world benchmarking scenarios, including varying concurrency levels, different input prompt and output response lengths, and accuracy evaluation on the latest industry-standard datasets.
- Compatible with any OpenAI-style API, including OpenAI, OpenRouter, self-hosted endpoints, or enterprise APIs.
- Supports multiple BYOE projects running in parallel, allowing users to test several custom endpoints concurrently.
vLLM Bench Integration
Metrum Insights now includes full native support for vLLM Bench, bringing an industry-standard performance suite directly into the platform.
- Automated vLLM-Bench serving deployment
- Native visualization of key metrics
- Export clean, structured reports in CSV format
- Zero-setup integration with existing benchmark workflows
Advanced Benchmark Configuration
Engineers now have deeper configuration controls when running workloads.
Users can now specify:
- Environment variables
- Tensor Parallelism (TP)
- Additional model serving-engine parameters
- New reasoning parser argument for reasoning-optimized models
These additions support fine-tuned control and analysis of model serving behavior across LLM and multimodal workloads.
Model Serving Upgrades
v3.7 adds support for the latest model serving frameworks:
- vLLM v0.12.0: Delivers faster engine performance with improved long-context handling, expanded quantization/model support, and more efficient GPU/memory execution.
- SGLang v0.5.6: Improves serving stability through key kernel fixes, ensuring more reliable structured and multimodal generation.
- TensorRT-LLM v1.2.0.rc4: Boosts NVIDIA GPU inference efficiency with broader model coverage, optimized kernels, and enhanced support for LoRA-based customization.
These versions bring expanded model support, better scheduling, improved streaming, and new optimization paths, ensuring Metrum Insights stays aligned with the fast-moving serving ecosystem.
| Start Benchmarking Today
Experience the latest version of Metrum Insights and see how easy it is to automate performance testing, compare results across hardware, and generate data-driven insights, all through a simple, no-code interface.
→ Contact us to get started at metrum.ai/insights
| Explore Our Latest Blogs and Media
- Unleashing GPT-OSS-120B: Performance Analysis on Dell PowerEdge XE9680
- Metrum AI Adds NVIDIA DGX Spark Into Metrum Insights
Copyright © 2025 Metrum AI, Inc. All Rights Reserved. All other product names are the trademarks of their respective owners.