Back to Whitepapers
Technical Whitepaper
WhitepaperJanuary 2026XE7745PowerEdge XE7745

Agentic AI for Multi-Layer URL Defense in Carrier Messaging with Dell PowerEdge XE7745 Server and NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs.

Abstract

Agentic AI for Multi-Layer URL Defense in Carrier Messaging with Dell PowerEdge™ XE7745 Server and NVIDIA RTX™ PRO 6000 Blackwell Server Edition GPUs.


Agentic AI for Multi-Layer URL Defense in Carrier Messaging with Dell PowerEdge™ XE7745 Server and NVIDIA RTX™ PRO 6000 Blackwell Server Edition GPUs.

A Carrier Architecture Reimagined to Address Malicious Messages, Powered by a two-node Dell PowerEdge™ XE7745 Rack Server Configuration with NVIDIA RTX™ PRO 6000 Blackwell Server Edition GPUs, Broadcom® Thor 2 Ethernet Controllers, and Broadcom® Tomahawk™ 5 Ethernet Switches.

January 2026

| Executive Summary

Smishing attacks, phishing delivered via SMS, MMS, and RCS, have become the fastest-growing threat vector in mobile communications. U.S. consumers reported losing $470 million to text message scams in 2024 alone, a figure five times higher than 2020.¹ Attackers exploit trusted messaging channels to steal credentials, intercept one-time passwords (OTPs), and initiate fraudulent transactions that cost the global telecommunications industry billions of dollars annually.

Carriers face a fundamental challenge: they must block malicious URLs in real time while minimizing false positives that disrupt legitimate business communications and maintaining sub-second message delivery latency. Traditional rule-based filtering cannot keep pace with rapidly evolving attack techniques, while purely AI-driven approaches introduce unacceptable latency for carrier-grade operations.

This whitepaper presents a dual-layer AI defense solution that achieves both speed and accuracy:

  • The Reflexive Layer delivers sub-millisecond filtering against a 2.5M+ URL blacklist, processing 1.93 million URLs per second.
  • The Reflective Layer performs deep behavioral analysis using specialized AI agents operating in sandboxed browser environments, powered by the Qwen3-30B-A3B-Thinking reasoning model running at 75,090 tokens per second.
  • Continuous Learning ensures that threats identified by deep analysis automatically update the fast-filtering cache, blocking novel attacks in real time for all subsequent encounters.

Deployed on Dell PowerEdge XE7745 servers with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs and Broadcom networking infrastructure, the solution scales horizontally to meet carrier-grade throughput requirements while maintaining the analytical depth needed to detect sophisticated social engineering attacks.

Table of Contents

Introduction

Solution Overview

Solution Architecture

Hardware Configuration & LLM Throughput Performance Evaluation

Key Performance Takeaways

Summary

References

Glossary

| Introduction

Messaging phishing, commonly called smishing, has emerged as the fastest-growing phishing vector. U.S. consumers reported losing $470 million to text message scams in 2024, a figure five times higher than 2020.¹ These attacks target SMS, MMS, RCS, and related messaging services, impersonating trusted brands and spoofing sender identities to steal credentials, intercept one-time passwords (OTPs), and initiate fraudulent transactions.

The threat landscape remains severe and growing:

  • Mobile users are 3× more likely to click malicious links in SMS than email.
  • Less than 35% of users understand what smishing is, contributing to high attack success rates.
  • A2P messaging channels carry sensitive business communications and OTPs, making them high-value targets.
  • Attack techniques evolve rapidly, with new campaigns appearing within hours of successful social engineering approaches.

Operators must block malicious messages before delivery while maintaining privacy, minimizing false positives, and meeting stringent real-time latency requirements. Failure to do so exposes carriers to fraud liability, invites regulatory scrutiny, accelerates customer churn, and erodes trust in messaging as a secure medium.

These channels carry sensitive user data and trusted brand communications—including authentication codes, banking alerts, and healthcare notifications—making them prime targets for phishing and impersonation attacks such as spoofed sender IDs, injected URLs, and obfuscated payloads.

To maintain message integrity, carriers must filter malicious URLs in A2P and business messaging flows before delivery: preserving user privacy, preventing false positives that disrupt legitimate business communications, while maintaining sub-second latency for high service quality.

This whitepaper presents a multi-layer, GenAI-powered URL defense strategy and reference implementation optimized for Dell PowerEdge XE7745 servers, NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, and Broadcom networking. The solution combines speed, scalability, and adaptive intelligence to detect and mitigate messaging-based phishing threats in real time.         

Solution Overview

The system safeguards Application-to-Person (A2P) and business messaging traffic from phishing and malicious URLs in real time, while ensuring regulatory compliance and subscriber data privacy. It combines high-performance heuristic filtering, sandbox-based behavioral analysis, and AI-driven learning loops to detect, adapt, and respond to new attack patterns as they emerge.

The architecture comprises two complementary layers:

Reflexive (Fast) Layer: Performs real-time inline filtering and URL classification using deterministic heuristics and reputation models. This layer handles the vast majority of traffic with sub-millisecond latency.

Reflective (Deep Analysis) Layer: Executes full sandboxed inspection of uncertain URLs to identify hidden threats through behavioral and structural analysis using specialized AI agents.

The critical innovation is the feedback loop between these layers: once the Reflective Layer identifies a new threat, this intelligence immediately updates the Reflexive Layer's cache, ensuring that subsequent encounters are blocked in sub-millisecond time rather than requiring repeated deep analysis.

Figure 1 | Solution Flow from Text Message API from Reflexive Layer to Reflective Layer

Reflexive (Fast) Layer

The Reflexive Layer performs high-speed, inline detection and blocking of phishing and fraud across A2P and enterprise messaging traffic. It analyzes URL and domain features on the fly, identifying threats through multiple detection mechanisms:

  • URL Structure Analysis: Flags excessive subdomains, IP-based hosts, overly long or encoded URLs, non-standard ports, and URL shorteners.
  • Domain Reputation: Evaluates malicious Top-Level Domains (TLDs), random or numeric-heavy domains, and known phishing keywords.
  • Brand Spoofing Detection: Identifies impersonation attempts targeting financial institutions, delivery services, and government agencies.
  • Homoglyph Detection: Detects attacks using visually similar characters from Cyrillic, Greek, Armenian, or Devanagari scripts (e.g., 'аpple.com' using Cyrillic 'а' instead of Latin 'a').
  • File Extension Analysis: Detects malicious file extensions commonly used in payload delivery.

Examples of Malicious URLs by Attack Type:

Attack Type

Example URLs

Detection Signal

Excessive Subdomains

secure.login.verify.paypal.com.evil.ru

Subdomain stacking to bury real domain

Malicious TLD + Brand Spoof

amazon-support.xyz

Suspicious TLD + brand name

IP-Based Host

http://192.168.1.47/bankofamerica/login

Raw IP = no DNS accountability

Encoded/Long URL

site.com/login%2Fverify%3Ftoken%3D... (200+ chars)

URL encoding hiding true path

Homoglyph Attack

аpple.com (Cyrillic "а"), payρal.com (Greek ρ)

Unicode lookalikes for Latin chars

URL Shortener + Non-Standard Port

bit.ly/3xK9mQ → evil.com:8443/phish

Obscured destination + unusual port

Malicious Extension

invoice.pdf.exe

Double extension hiding executable

Table 1 | Malicious URL Examples by Attack Type

With over 15 built-in heuristic and reputation checks, the Reflexive Layer delivers sub-millisecond verdict delivery for blacklisted URLs and is capable of millions of cache lookups per second. The layer is optimized for low-latency message filtering and scoring, supporting horizontal scaling to achieve deterministic latency and predictable performance under peak A2P loads.

Reflective (Deep Analysis) Layer

Figure 2 | Reflective Layer Workflow Example Evaluation  

The Reflective Layer performs deep per-URL inspection through a coordinated set of specialized analysis agents, all managed by a central orchestrator. The orchestrator queues incoming URLs from the Reflexive Layer and assigns them to available agents, enabling parallel execution and efficient use of compute resources within an isolated, sandboxed browser environment.

Five specialized AI agents work in coordination to analyze suspicious URLs:

Agent

Function

Key Capabilities

Content Agent

Text analysis

Extracts and analyzes visible and embedded text to identify scam language, urgency cues, and brand impersonation patterns

HTML Agent

DOM parsing

Parses document object model to detect credential collection forms, hidden elements, and embedded resources used in phishing workflows

JavaScript Agent

Code execution

Executes and deobfuscates client-side code to trace runtime behavior, monitor dynamic API calls, and uncover anti-analysis techniques

Network Agent

Infrastructure analysis

Follows redirect chains and evaluates hosting infrastructure through DNS, RDAP, and WHOIS analysis to assess reputation and campaign indicators

Scoring Agent

Verdict synthesis

Correlates content, structural, behavioral, and infrastructure signals to generate confidence-scored verdicts

Table 2 | Reflective Layer Analysis Agents

All agent outputs are consolidated by the Scoring Agent, which correlates content, structural, behavioral, and infrastructure signals to generate a confidence-scored verdict. The orchestrator enforces execution policies and timeouts, aggregates results, and persists extracted indicators and metadata. These results feed back into the Reflexive Layer's fast-filtering pipeline, continuously improving detection accuracy and reducing analysis latency for future encounters.

Data & Message Flow

Figure 3 | Message Flow: Reflexive & Reflective Layers

Messages enter through the API Gateway, which routes incoming traffic to the Reflexive Layer. The Orchestration Service manages flow control between the Fast URL Filter and supporting backend services.

The Fast URL Filter, a key component of the Reflexive Layer, performs lookups against a Valkey-backed blacklist containing millions of entries. Valkey is a high-performance Redis alternative optimized for this workload, achieving sub-millisecond (830–990 μs) lookup and scoring latency across 2.5 million blacklisted URLs and multiple regex patterns on the Dell PowerEdge XE7745 cluster.

Known malicious URLs are resolved instantly, while uncertain or low-confidence URLs are escalated to the Reflective Layer via a message queue. The escalation threshold is configurable based on operator risk tolerance and traffic patterns.

Within the Reflective Layer, each URL undergoes sandbox analysis to detect phishing, data exfiltration, or obfuscation behavior. Indicators of Compromise (IOCs), including domains, IP addresses, and raw webpage content are extracted and stored for correlation and threat intelligence sharing.

Analysis results feed back to the Orchestration Service to update reputation lists and learning models. This feedback loop is critical: once a threat is identified by the Reflective Layer, the Reflexive Layer blocks all future instances in sub-millisecond time.

The architecture is horizontally scalable and extensible to various messaging protocols, enabling consistent protection across SMS, MMS, RCS, and future communication formats without re-engineering the pipeline.

| Solution Architecture

Figure 4 | Solution Architecture: Hardware to Application Stack

The solution's architecture is built upon a high-performance Dell PowerEdge XE7745 hardware foundation, utilizing NVIDIA RTX Pro 6000 GPUs and Broadcom Thor 2 Ethernet Controllers to deliver the computational density required for carrier-grade message processing.

This physical layer is optimized via NVIDIA CUDA® Toolkit 13.0 and managed through Kubernetes, ensuring that the system can scale horizontally to meet peak A2P traffic demands. The integration of vLLM as the inference runtime allows the platform to serve the Qwen/Qwen3-30B-A3B-Thinking-2507 reasoning model with high efficiency, utilizing BF16 precision to maintain a superior balance between numerical accuracy and memory throughput.

The orchestration of the system is handled by an Apache APISIX gateway and an Autogen-based multi-agent framework, which manages the transition of suspicious URLs from the Reflexive Layer to the deep-analysis agents.

Within the Reflective Layer, URLs are processed through specialized agents operating in a gVisor-protected sandbox. gVisor provides an application kernel that intercepts system calls, isolating sandbox execution from the host system to prevent malware escape. The Qwen3-30B-A3B-Thinking model acts as the primary reasoning engine, performing semantic evaluation of scam language and intent.

By running the model at BF16 precision, the system achieves the necessary inference speed to generate real-time confidence scores while preserving the fine-grained linguistic nuances required to identify sophisticated brand impersonation and social engineering tactics.

Data integrity and observability are maintained through a robust backend stack comprising PostgreSQL for persistent IOC storage, Valkey for high-speed cache lookups, and a Prometheus metrics collector for real-time performance monitoring

The result is a self-reinforcing loop where the Scoring Agent correlates signals from the HTML, JavaScript, Network, and Content agents to produce a final verdict. These insights are immediately fed back into the blacklist URL cache, ensuring that once a threat is identified by the Reflective Layer, it is blocked for all subsequent encounters. This configuration enables high-throughput, AI-accelerated security and compliance enforcement across global carrier environments.

| Hardware Configuration & LLM Throughput Performance Evaluation

The performance evaluation was conducted using the following hardware configuration, which provides the compute density, parallel throughput, and network fabric required to support AI-driven URL analysis pipelines at carrier scale:

  • 2× Dell PowerEdge™ XE7745 Rack Servers.
  • 8× NVIDIA RTX™ PRO 6000 Blackwell Server Edition GPUs per server (16 total).
  • Broadcom® Thor 2 (BCM57608) Ethernet Controllers—one per GPU for dedicated network I/O.
  • Dell PowerSwitch Z9864F-ON with Broadcom® Tomahawk™ 5 Ethernet Switches.

To evaluate the hardware, first a performance analysis was conducted on LLM Throughput to proxy Reflective layer token generation using the Qwen3-30B-A3B-Thinking-2507 Model. LLM throughput was across two token-window configurations, 128/128 and 2048/2048, to evaluate how input size affects Reflective Layer analysis performance. Token windows define the input/output context size; shorter windows enable faster processing but limit analysis depth.

Measurements were captured on both single-node and dual-node Dell PowerEdge XE7745 server configurations, using the model's maximum-completion-rate operating point for each token window.

Figure 5 | LLM Max Completion Rate by Token Window and Node Configuration

On a single Dell PowerEdge XE7745, the system achieved 47,371 tokens/sec at the 128/128 window, compared to 12,064 tokens/sec at 2048/2048. The dual-node configuration scales this further, reaching 75,090 tokens/sec at the 128/128 window, achieving near linear scaling.

Solution Performance Evaluation Methodology

The methodology used for performance evaluation focuses on measuring the throughput and latency of the two architectural layers:

Reflexive Layer: Measurement of maximum URL filtering throughput under CPU load, including cache lookup performance and pattern matching efficiency.

Reflective Layer: Evaluation of overall throughput measured in URLs processed per minute, assessment of end-to-end performance under simultaneous GPU load, and the learning rate associated with cache updates after URL evaluation.

Reflexive Layer Performance Evaluation Methodology  

The Reflexive Layer benchmarking assesses the maximum URL processing throughput of the system's fast-path component, which leverages a quick-lookup cache and is predominantly CPU-bound.

Benchmarking was conducted by sending 100,000 text message requests at high concurrency (800 simultaneous connections). The messages contain malicious URLs selected from the preloaded Valkey-backed cache of 2.5 million blacklisted URLs, ensuring that each request results in a cache hit and isolating the performance of the Reflexive Layer.

The primary variable in this evaluation is the batch size, defined as the number of URLs per message, which is systematically increased from 1 to 350 to assess how the system handles progressively complex workloads. In the dual-node configuration, the workload is evenly distributed across both nodes to quantify the benefits of horizontal scaling.

Reflexive Layer Performance Results

Figure 6 | Reflective Layer URLs/Second vs Batch Size

URLs/sec (URLs per second) quantifies the throughput of the Reflexive Layer, representing the number of malicious URLs it can analyze and filter per second.

Throughput increases consistently with batch size. The dual-node configuration achieves 1.93M URLs/sec, whereas the single-node setup reaches ~955K URLs/sec at maximum batch size. This confirms that Reflexive Layer performance scales horizontally and that CPU-bound fast-path processing remains efficient across both nodes.

End-to-End Application Latency (ms)

End-to-end application latency represents the average application-level response time as calculated by Apache Bench (ab), defined as the total benchmark execution time divided by the number of completed requests and reflecting the mean client-observed response time.

The system follows an API-as-a-service architecture and is designed to return an immediate response for every request. Each request is synchronously evaluated against the Valkey-based URL cache containing approximately 2.5 million blacklisted URLs. Requests not resolved by the cache are escalated to the reflective analysis layer asynchronously, ensuring that downstream processing does not affect the reported response time.

At a batch size of 350 URLs per request, the single-node configuration sustains 2,731 requests per second with an average response time of approximately 290 ms. Scaling to a dual-node configuration increases throughput to 5,512 requests per second while maintaining comparable latency of approximately 290 ms, demonstrating efficient horizontal scaling with stable response-time characteristics.

Reflective Layer Performance Evaluation Methodology  

The Reflective Layer benchmarking evaluates the full end-to-end security pipeline under combined CPU and GPU load, encompassing both the initial fast-processing layer and the deep analysis layer. The objective is to measure end-to-end latency and GPU utilization during comprehensive security analysis.

The test uses a controlled set of 58 internally hosted malicious URLs that are deliberately excluded from the reflexive layer cache, ensuring all requests are escalated for deep analysis, LLM-based scoring, and cache updates, thereby simulating cache-miss behavior. Requests are executed in sequential batches, with each batch initiated only after the previous one completes, ensuring deterministic processing and accurate alignment between URLs evaluated and cache learning behavior. The evaluation is performed on both single-node and dual-node configurations, with the dual-node setup scaling LLM and agent replicas to assess distributed, full-pipeline performance.

Reflective Layer Performance ResultsThroughput (URLs Evaluated/min)

URLs evaluated/min is the total number of full security analyses successfully completed by all agents and the LLM per minute. It measures the fundamental rate at which the agents can process suspicious URLs and generate an outcome.

Figure 7 | URLs Evaluated per Minute vs Input Requests

Reflective throughput scales with the number of nodes. The dual-node configuration nearly doubles the number of URLs analyzed per minute, ensuring that reflective workloads remain aligned with incoming URL volume without creating backpressure on the Reflexive Layer.

Reflective Layer Learning Rate (New URLs Cached/min)

Learning Rate reflects the rate at which new security intelligence is generated, measuring the number of malicious URLs added to the blacklist URLs cache per minute following reflective analysis. Because all 58 evaluated URLs are malicious by design, the Learning Rate is expected to closely align with the analysis throughput (URLs evaluated per minute), with minor variations potentially arising from timing and aggregation effects, particularly in multi-node configurations.

Figure 8 | Learning Rate vs Input Requests

Cache update rate increases with input volume, with the dual-node system achieving higher update rates across all test conditions. This reflects the increased availability of reflective processing capacity and confirms that parallel agent execution improves the speed of cache enrichment. At 15 input requests, the single-node configuration achieves a learning rate of 3.58 URLs/min, while the dual-node configuration reaches 5.15 URLs/min. This reflects the increased availability of Reflective processing capacity and confirms that parallel agent execution improves the speed of cache enrichment.

| Key Performance Takeaways

1) Reflexive Layer provides rapid URL filtering using an in-memory cache, relying on CPU processing. The single node handled 955K URLs/sec, while the dual-node setup reached 1.93M URLs/sec (batch=350). This near-linear 2× increase with double the nodes shows that the layer scales efficiently horizontally for real-time threat blocking.

2) For a single Dell PowerEdge XE7745 node running the Qwen3-30B-A3B-Thinking model, the 128-token configuration achieves 47,371 tokens/second, compared to 12,064 tokens per second for the 2048-token configuration. Multi-node deployment scales to 75,090 tokens per second and 23,297 tokens per second respectively, demonstrating throughput scaling achievable with additional nodes.

3) URLs/min represents the system's raw throughput,  how many URLs complete full security analysis per minute across all agents. Learning Rate specifically tracks malicious URLs identified and cached, reflecting the threat detection output of the multi-agent pipeline where each URL triggers 5+ GPU-based LLM analyses agents (HTML, JS, content, network, scoring agents). Single-node peaked at 3.58 URLs/min with a learning rate of 3.58, while dual-node achieved 5.83 URLs/min with a learning rate of 5.15, demonstrating that distributing the LLM workload across additional GPUs enables higher throughput for real-time threat detection and cache updates.

Across all KPIs, URLs/sec, learning rate, reflective throughput, and response latency, the dual-node configuration provides measurable performance improvements with predictable scaling characteristics. These results validate the architectural model in which Reflexive processing absorbs the majority of workload at low latency, while Reflective analysis and Learning processes operate in parallel without degrading fast-path responsiveness.

| Summary

This carrier-grade phishing defense solution blocks malicious URLs per second while continuously learning from new threats, delivering both the sub-millisecond latency carriers require and the deep analysis accuracy that emerging threats demand.

The architecture cleanly separates high-speed, CPU-based Reflexive filtering from GPU-accelerated Reflective analysis, enabling rapid URL screening at scale while reserving GPU resources for deeper inspection. Deployed on Dell PowerEdge XE7745 servers with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, the platform delivers efficient parallel processing for LLM-driven agent analysis, with throughput scaling predictably as nodes are added.

Broadcom Thor 2 Ethernet Controllers and Tomahawk 5–based Dell PowerSwitch networking provide ultra-low-latency packet processing and non-blocking 800G fabric connectivity, ensuring that both message ingestion and inter-node coordination scale without congestion. This hardware-software co-design allows the system to sustain high URL evaluation rates while distributing complex, multi-agent GPU workloads across nodes, accelerating threat detection and cache learning as capacity expands.

Together, Dell and Broadcom deliver the performance foundation for Metrum AI’s horizontally scalable, AI-driven messaging security platform, a zero-compromise solution combining speed, precision, and adaptability for next-generation carrier defense.

To learn more, please request access to our reference implementation by contacting us at contact@metrum.ai.

| References

Image References

  • Dell Technologies — Dell PowerEdge™ XE7745 Server. Source: Dell.com
  • NVIDIA Corporation — NVIDIA RTX™ PRO 6000 Blackwell Server Edition GPUs. Source: Nvidia.com
  • Broadcom Inc. — Broadcom® Thor™ 2 Ethernet Controllers and Broadcom® Tomahawk™ 5 Ethernet Switches. Source: Broadcom.com.

¹ Federal Trade Commission, "New FTC Data Show Top Text Message Scams of 2024; Overall Losses to Text Scams Hit $470 Million," April 16, 2025.

https://www.ftc.gov/news-events/news/press-releases/2025/04/new-ftc-data-show-top-text-message-scams-2024-overall-losses-text-scams-hit-470-million


Copyright © 2026 Metrum AI Inc. All Rights Reserved.

This project was commissioned by Dell Technologies. Dell and other trademarks are trademarks of Dell Inc. or its subsidiaries. NVIDIA, RTX and other NVIDIA product names are trademarks of NVIDIA corporation. Broadcom and Broadcom product names are trademarks of Broadcom or its affiliates. All other product names mentioned are trademarks or registered trademarks of their respective owners.

***DISCLAIMER - Performance varies by hardware and software configurations, including testing conditions, system settings, application complexity, the quantity of data, batch sizes, software versions, libraries used, and other factors. The results of performance testing provided are intended for informational purposes only and should not be considered as a guarantee of actual performance.

| Glossary

Term

Definition

A2P

Application-to-Person messaging; messages sent from business applications to mobile subscribers

BF16

Brain Floating Point 16; a numeric format that reduces memory requirements while maintaining accuracy for AI inference

Homoglyph

Characters from different scripts that appear visually similar (e.g., Cyrillic 'а' vs. Latin 'a')

IOC

Indicator of Compromise; forensic artifact indicating malicious activity (domains, IPs, file hashes)

MMS

Multimedia Messaging Service; messaging standard supporting images, audio, and video

OTP

One-Time Password; temporary authentication codes often delivered via SMS

RCS

Rich Communication Services; enhanced messaging protocol supporting typing indicators, read receipts, and richer media

RDAP

Registration Data Access Protocol; modern replacement for WHOIS providing domain registration information

Smishing

SMS phishing; fraudulent messages designed to steal credentials or initiate unauthorized transactions

TLD

Top-Level Domain; the rightmost segment of a domain name (e.g., .com, .xyz, .bank)

Valkey

High-performance Redis-compatible in-memory data store optimized for caching workloads

vLLM

High-throughput inference engine optimized for serving large language models

Table: Glossary of Terms