Accelerate YourAI PerformanceTesting
Configure unlimited combinations of models, software, hardware, and hyperparameters in seconds.
Latest Features Available
VectorDB benchmarking, KV cache SSD offloading, and upgraded runtimes with support for the latest models.
VectorDB Bench (Milvus)
End-to-end vector database benchmarking with HNSW and DISKANN indexing, plus new storage metrics including IOPS, latency, throughput, queue depth, and DRAM.
SSD Offload via LMCache
KV cache disk offloading for NVIDIA Dynamo + vLLM, reducing GPU and DRAM pressure for large models and long contexts.
vLLM & SGLang Upgrades
vLLM upgraded to 0.19.0 with support for Gemma 4, GLM 5.1, and Minimax 2.7. SGLang upgraded to 0.5.10.post1.
Fully Automated AI Performance Benchmarking
Configure unlimited combinations of models, software, hardware, and hyperparameters in seconds.
Parameters
Configure and test across multiple dimensions simultaneously.
Chips
Benchmark across architectures.
Models
Test the latest foundation models.
Real-Time Metrics
Track performance across every dimension.
Throughput
tok/s
Latency
TTFT/TPOT
Power Usage
Watts
Efficiency
tok/W
AI-Powered
AI-Powered Analysis with Creator
Leverage AI-powered analysis to automatically generate insights, identify bottlenecks, and receive optimization recommendations.
- Automated report generation
- Performance anomaly detection
- Optimization suggestions
- Natural language queries


Hardware Planning
Hardware Sizer: Right-Size Your AI Infrastructure
Plan GPU clusters, estimate costs, and optimize hardware selection for your AI workloads.
Real-Time Monitoring
Performance Visualization with Pulse
Real-time visualization of performance metrics across your benchmarking runs.

Ready to Accelerate Your AI Performance?
Join industry leaders using Metrum Insights to optimize their AI infrastructure.