Skip to main content
Metrum Insights v3.9 is live.

Latest Features Available

VectorDB benchmarking, KV cache SSD offloading, and upgraded runtimes with support for the latest models.

VectorDB Bench (Milvus)

End-to-end vector database benchmarking with HNSW and DISKANN indexing, plus new storage metrics including IOPS, latency, throughput, queue depth, and DRAM.

SSD Offload via LMCache

KV cache disk offloading for NVIDIA Dynamo + vLLM, reducing GPU and DRAM pressure for large models and long contexts.

vLLM & SGLang Upgrades

vLLM upgraded to 0.19.0 with support for Gemma 4, GLM 5.1, and Minimax 2.7. SGLang upgraded to 0.5.10.post1.

Fully Automated AI Performance Benchmarking

Configure unlimited combinations of models, software, hardware, and hyperparameters in seconds.

Parameters

Configure and test across multiple dimensions simultaneously.

Concurrency Levels
Token Lengths
Request Rates
Precision Modes

Chips

Benchmark across architectures.

NVIDIA DatacenterAMD InstinctIntel Gaudi 3AMD EPYCIntel XeonRTX GPUsIntel Arc

Models

Test the latest foundation models.

GPT-OSSDeepSeekGemmaPhiQwenMistralLlama 4+

Real-Time Metrics

Track performance across every dimension.

Throughput

tok/s

Latency

TTFT/TPOT

Power Usage

Watts

Efficiency

tok/W

AI-Powered

AI-Powered Analysis with Creator

Leverage AI-powered analysis to automatically generate insights, identify bottlenecks, and receive optimization recommendations.

  • Automated report generation
  • Performance anomaly detection
  • Optimization suggestions
  • Natural language queries
Try Creator
AI-Powered Creator
Hardware Sizer

Hardware Planning

Hardware Sizer: Right-Size Your AI Infrastructure

Plan GPU clusters, estimate costs, and optimize hardware selection for your AI workloads.

GPU Selection
Cluster Sizing
Cost Analysis
Performance Estimates

Real-Time Monitoring

Performance Visualization with Pulse

Real-time visualization of performance metrics across your benchmarking runs.

Pulse Performance Visualization