| Introduction

Metrum Insights continues to evolve as the platform of choice for AI performance benchmarking, delivering features that streamline testing, enhance observability, and accelerate decision-making. The latest release, v3.6, brings significant upgrades to model serving frameworks, automated evaluation capabilities, system health monitoring, and user experience. Here are the key features from v3.6:

Model Serving Framework Upgrades

Metrum Insights v3.6 includes major updates to the core model serving frameworks, ensuring compatibility with the latest performance optimizations and features. The platform now supports vLLM v0.11.0, SGLang v0.5.3, TensorRT-LLM v1.0.0, and HabanaAI’s vllm-fork, providing users with improved throughput performance, model coverage, and scalability:

Auto-Evaluation of Performance Data

A new automated evaluation pipeline streamlines the analysis of performance metrics and model response data. This feature provides comprehensive summaries of scenarios and runs, aggregated performance results, and detailed error reporting, enabling teams to quickly identify performance trends, anomalies, and optimization opportunities without minimal manual intervention.

Enhanced System Health Checks & Metrics

Metrum Insights now leverages Redfish APIs to extend system monitoring capabilities, providing more visibility into hardware health and operational metrics. This enhancement allows teams to proactively monitor system conditions, track hardware performance indicators, and correlate infrastructure health with benchmarking results for more reliable testing environments.

Error Handling & Robustness Improvements

v3.6 introduces smarter validation for user inputs such as Hugging Face tokens and API keys, along with more intuitive, user-readable error messages throughout the platform. Users can also download log files from the system-under-test directly through Metrum Insights, making troubleshooting faster and more transparent.

Dynamic Prompting & Reasoning Datasets

New dataset generation logic supports reasoning-based prompt evaluation, along with dynamically generated prompts with varied input and output lengths, helping teams test a larger variety of real-world performance scenarios.

| Start Benchmarking Today

Experience the latest version of Metrum Insights and see how easy it is to automate performance testing, compare results across hardware, and generate data-driven insights, all through a simple, no-code interface.

Contact us to get started at metrum.ai/insights

| Recent Upgrades

| Explore Our Latest Blogs and Media


Copyright © 2025 Metrum AI, Inc. All Rights Reserved. All other product names are the trademarks of their respective owners.