AI-Powered Multi-Agent Infrastructure Monitoring for Telecom Networks on Dell PowerEdge XE9785L with AMD Instinct MI355X GPUs

Current Version

AI-Powered Multi-Agent Infrastructure Monitoring for Telecom Networks on Dell PowerEdgeTM XE9785L with AMD InstinctTM MI355X GPUs

February 2026

| Executive Summary

Mobile network data traffic is projected to reach several hundred exabytes per month by 2030, driven by an estimated 30+ billion connected devices generating both human and machine data.^[1] As this growth accelerates, telecom operators face a widening gap between the speed of equipment failures and the pace of manual troubleshooting. Cascading faults across tens of thousands of geographically dispersed Remote Radio Heads (RRHs) can escalate to widespread outages before operations teams identify a root cause. This paper presents a Multi-Agent Infrastructure Monitoring solution, built on Dell PowerEdge™ XE9785L servers with AMD Instinct™ MI355X accelerators, that closes that gap through autonomous, on-premises AI.

The platform ingests system logs from Baseband Units (BBUs), RRHs, and Operations Support Systems (OSS) in real time. A coordinated suite of specialized AI agents, powered by the Qwen3-235B reasoning model, continuously detects anomalies, determines root causes, and initiates remediation, replacing multi-hour manual troubleshooting cycles with sub-two-minute automated resolution.

Key Results at a Glance

Sub-90-Second Single-Incident Remediation

Detection to resolution in under 90 seconds, compared to multi-hour manual cycles

1.5x to 2x Faster Event Processing

MI355X vs MI300X across tested configurations (3 to 30 BBUs)

2.3 TB Combined GPU Memory

Single 8-GPU node, no multi-node sharding

Up to 1.6× Generational Throughput Gain

MI355X vs MI300X inference tokens per second at matched concurrency

7,773 Tokens/sec Peak Throughput

At 30 BBUs / 90 RRHs max tested config

2,253 Events/min at Scale

Sustained processing at max tested configuration

Table Of Content

Executive Summary

The Telecom Infrastructure Challenge

Solution Overview

Solution Flow

Dashboard Capabilities

Solution Architecture

Software Stack

Open Ecosystem Foundation: AMD ROCm

Model Provisioning and Lifecycle Management

Agent Orchestration

Server Health and Uptime Management

Safety and Human Oversight

Example Scenario: Fronthaul Link Failure

Infrastructure Foundation

ROCm and the vLLM Inference Pipeline

Operational Ecosystem: From Deployment to Production

Approach Comparison

Performance Benchmarking

Test Configuration

Scalability Results

Resource Utilization

Per GPU Breakdown at Peak Load

Memory Advantage Analysis

Generational Comparison: MI355X vs. MI300X

Latency Profile

Limitations and Considerations

Conclusion

Addendum: Key Concepts for IT Decision Makers

What is RAG, and why is it critical for enterprises?

Why is Dell PowerEdge XE9785L with AMD Instinct MI355X well-suited for RAG solutions?

What are Multi-Specialist Agents and Multi-Agent Frameworks?

System Under Test

Glossary of Technical Terms

References

| The Telecom Infrastructure Challenge

Figure 1 | The Telecom Challenge

Telecom operators manage increasingly complex infrastructure under intensifying pressure as 5G deployments accelerate and network scope expands. In a typical Cloud Radio Access Network (C-RAN) architecture, RRHs handle radio frequency processing at cell sites while BBUs provide centralized baseband signal processing. OSS platforms integrate data from both components to monitor network health and maintain Quality of Service (QoS) requirements. When equipment fails or performance degrades, operations teams must rapidly identify affected components, determine root causes, and execute corrective actions before service quality suffers. Industry analyses estimate that unplanned network downtime costs Tier 1 operators upward of $100,000 per hour in lost revenue and customer churn.^[2]

C-RAN Network Primer

RRH (Remote Radio Head): Distributed across multiple cell sites to perform radio signal transmission and reception. Handles radio frequency (RF) processing at the network edge.

BBU (Baseband Unit): Consolidated within a centralized processing pool to deliver high-performance baseband computation. Handles digital signal processing.

OSS (Operations Support Systems): Software tools that analyze and manage the telecommunications network, including network monitoring, fault management, and performance optimization.

In a typical network, RRHs connect to the BBU pool through fronthaul links. BBUs can be dynamically assigned to serve clusters of RRHs in a many-to-one configuration.

Legacy monitoring approaches often fail to keep pace with the scale and velocity of modern networks. Manual log analysis cannot process the sheer volume of telemetry data generated by thousands of distributed network elements. Batch processing systems introduce delays that allow issues to escalate before detection. Siloed monitoring tools make it difficult to correlate events across BBUs, RRHs, and OSS platforms. These operational gaps carry measurable consequences: extended mean time to resolution, increased service disruptions, and elevated operational costs associated with dispatching field technicians for issues that could be diagnosed remotely.

The table below highlights the primary operational challenges that prevent network operations teams from achieving continuous situational awareness.

Challenge	Current State	Business Impact
Delayed Detection	Batch systems and manual log reviews run on multi-hour cycles	Equipment failures escalate to widespread outages before identification
Cascading Failures	Single-point failures propagate across interconnected base stations	Localized issues become network-wide service disruptions
Manual Root Cause Analysis	Engineers manually correlate logs across BBU, RRH, and OSS systems	Extended mean time to resolution increases customer impact
Signal Degradation	Environmental factors and misconfigurations cause gradual performance decline	QoS deteriorates before thresholds trigger alerts
Geographic Distribution	Tens of thousands of RRHs deployed across dispersed locations	Dispatching field technicians for remote diagnostics adds cost and delays resolution.

Table 1 | Operational Challenges and Business Impact

Addressing these challenges requires a platform purpose-built for continuous, automated network intelligence. The following sections describe how the Multi-Agent Infrastructure Monitoring solution transforms fragmented troubleshooting workflows into a unified, real-time operations capability.

| Solution Overview

At the core of this solution, a coordinated team of specialized AI agents autonomously monitors network health, detects anomalies, and executes remediation actions. When new telemetry data arrives from BBUs, RRHs, or OSS platforms, the system uses large language models to analyze log patterns, classify severity levels, and identify affected components. Specialized agents then collaborate to determine root causes and initiate or recommend corrective actions aligned with the specific issue type.

Agent	Function	Operation Mode
Operations Manager Agent	Delegates tasks to domain-specific agents based on issue classification	Continuous
NOC Analyst Agent	Monitors BBU and RRH logs for anomalies and forwards issues to Operations Manager	Continuous
Communication Link Monitor	Resolves communication link failures between BBU and RRH components	Event-triggered
Synchronization Specialist	Resolves timing and synchronization issues in telecom infrastructure	Event-triggered
Hardware Health Agent	Resolves hardware failure issues affecting BBU and RRH equipment	Event-triggered
Reporting Agent	Generates incident reports and executive summaries for operations review	User-initiated

Table 2 | Multi-Agent Functions

Each agent maps to a specific operational gap identified in Table 1, ensuring that no category of network incident goes unaddressed.

Solution Flow

Figure 2 | Solution Flow

The platform operates through four integrated stages, each running continuously within the telecom operator's infrastructure. In the first stage, the Vector Data Pipeline ingests telemetry from multiple source types: system logs from BBUs and RRHs; event logs and QoS metrics from OSS platforms; and operational documents containing configuration and procedural information. A Kafka-based event streamer normalizes these inputs and routes them to the processing layer.

The second stage transforms raw data into searchable knowledge. The bge-large-en text embedding model converts log entries and documents into vector representations stored in PgVector (a vector similarity search extension for PostgreSQL) for semantic search. Time-series telemetry flows to GreptimeDB (an open-source distributed time-series database) for temporal analysis. This dual-database approach enables agents to query both semantic similarity (finding related past incidents) and temporal patterns (identifying performance trends).

The third stage executes the agent workflows. When the NOC Analyst Agent detects a warning or error condition in incoming logs, it forwards the issue to the Operations Manager Agent. The Operations Manager Agent classifies the problem type and delegates to the appropriate specialist. The Communication Link Monitor handles connectivity failures, the Synchronization Specialist addresses timing issues, and the Hardware Health Agent manages equipment malfunctions. Each specialist agent queries the vector database for similar past incidents and uses the Qwen3-235B reasoning model to determine root cause and corresponding corrective actions.

Dashboard Capabilities

Figure 3 | Unified Command Center Dashboard

The final stage delivers actionable intelligence through a unified dashboard. The command center interface provides interactive Leaflet maps with geographic network visualization, real-time event streams, an agent chat interface, and GPU cluster monitoring within a single dashboard. Key components include:

Base Station Status: Network operations staff see color-coded health indicators at a glance, eliminating the need to parse raw log files during active incidents. Green indicates normal operations, orange signals early warnings where agents are monitoring for potential degradation, and red indicates active issues that trigger automated resolution workflows.
Global QoS Metrics: Operations teams monitor live performance without switching between multiple tools. The dashboard streams block rate, data drop, and call drop percentages alongside downlink and uplink throughput levels. Agents automatically correlate anomalies with underlying network or hardware events.
Event Streams and Active Agents: Supervisors track workload distribution across the agent team in real time. The panel displays event counts from BBU, RRH, and OSS logs alongside active agent workflows, processed incidents, and monitored devices.
Issue Alerts Panel: Engineers drill down from high-level alerts to root cause analysis without context switching. The interactive incident list maps affected RRHs, BBUs, and base stations with visual indicators for ongoing, resolved, or escalated issues.
Report Generation: Compliance and operations teams export documentation for audits and post-incident reviews. Specific Incident Reports provide detailed root cause analysis (RCA) with corrective actions and affected components. Executive Summary Reports aggregate insights across all base stations. Both formats export as JSON or PDF.

Example Scenario: Fronthaul Link Failure

To illustrate how these components work together, consider a fronthaul link failure scenario. At 2:14 AM, the communication link between BBU-12 and RRH-47 fails due to a firmware mismatch introduced during a routine update. Within seconds, the NOC Analyst Agent detects the ERROR log in the incoming telemetry stream and forwards it to the Operations Manager Agent. The Operations Manager classifies the issue as a connectivity failure and delegates to the Communication Link Monitor.

The Communication Link Monitor queries the vector database for similar past incidents and identifies three prior cases where firmware version mismatches caused identical symptoms. Using the Qwen3-235B reasoning model, the agent confirms the root cause, determines that RRH-47 requires a firmware rollback, and initiates an automated link restart sequence. The entire detection-to-remediation cycle completes in less than 90 seconds.

This 90-second cycle represents a single-incident response under moderate load. Under production conditions with concurrent monitoring of 30 BBUs and 90 RRHs, average workflow durations range from 102 to 463 seconds depending on issue complexity, as detailed in the Performance Benchmarking section.

Without the multi-agent system, this incident would typically wait for the next batch processing cycle, potentially leaving customers without service for hours. Field technicians might need to be dispatched, adding cost and further extending resolution time. With the multi-agent system, the autonomous workflow transforms a multi-hour outage into a sub-two-minute remediation event. The Reporting Agent logs the complete evidence chain, and the dashboard updates to show RRH-47 returning to green status.

| Solution Architecture

The architectural decisions behind the Multi-Agent Infrastructure Monitoring solution reflect a fundamental requirement: every component must deliver the throughput needed for real-time network intelligence while operating within the telecom operator's secure infrastructure. The platform combines optimized inference runtimes and a modular software stack designed for continuous operation at telecom-grade reliability.

Figure 4 | Solution Architecture

Software Stack

The software architecture layers optimized runtimes atop the AMD Radeon Open Compute™ platform (ROCm™) 7.0. vLLM provides the inference runtime, delivering high-throughput token generation with continuous batching and PagedAttention memory management. This combination enables the solution to serve multiple concurrent agent requests while maintaining consistent, predictable latency for time-sensitive incident response.

Model Provisioning and Lifecycle Management

Production AI model deployment in telecom environments demands a streamlined path from selection through validation to runtime serving. This solution provisions the Qwen3-235B reasoning model and bge-large-en embedding model through Dell Enterprise Hub, integrated with the Hugging Face model repository. Dell Enterprise Hub provides pre-validated configurations optimized for Dell PowerEdge servers with AMD Instinct accelerators, ensuring compatibility across the hardware and software stack. The operational benefits of this approach are detailed in the Operational Ecosystem section.

For telecom operators managing multiple NOC sites, this centralized model management approach ensures consistency across deployments: every site runs the same validated model version with the same serving configuration, reducing the risk of inconsistent agent behavior across the network.

Layer	Component	Function
Hardware Optimization	AMD ROCm 7.0	GPU compute and memory management
Inference Runtime	vLLM v0.10.1	High-throughput model serving with continuous batching
Agent Framework	AutoGen (Microsoft)	Multi-agent orchestration and asynchronous task execution
Agent Communication	MCP + A2A Protocol	Context sharing and inter-agent coordination
Reasoning Model	Qwen3-235B-A22B -Thinking	Root cause analysis and corrective action determination
Embedding Model	bge-large-en	Text embeddings for semantic search and similarity matching
Vector Database	PgVector + PostgreSQL	Vector storage, similarity search, and metadata management
Time-Series Database	GreptimeDB	High-frequency telemetry and event data storage
Event Streaming	Vector Data Pipeline	Real-time ingestion, transformation, and routing

Table 3 | Software Stack Components

Agent Orchestration

The AutoGen framework (Microsoft's open-source multi-agent orchestration framework) coordinates the specialized agents that execute the monitoring and remediation workflows. The Model Context Protocol (MCP) acts as a standardized interface for context sharing across agents, while the Agent-to-Agent (A2A) protocol ensures secure, structured communication between microservices. Together, these components form the control plane that handles request routing, inter-agent orchestration, and operational telemetry for distributed AI workflows with complete traceability and observability.

Each agent accesses shared resources through well-defined interfaces: the vector database for semantic search across historical incidents, GreptimeDB for time-series telemetry analysis, and the model inference endpoints for AI-powered reasoning. This modular architecture enables independent scaling of individual components, such as adding embedding model replicas during peak event volumes, without disrupting active monitoring workflows. The architecture processes thousands of network messages and log streams per day across hundreds of nodes on a single server, while supporting horizontal scaling for larger deployments.

The orchestration layer also handles agent failure recovery. If a specialist agent encounters an error during root cause analysis, the Operations Manager Agent reassigns the task to a backup agent or escalates to the reporting layer with a partial analysis. This fault-tolerant design ensures that a single agent failure does not leave a network incident unaddressed. All inter-agent messages and task handoffs are logged, providing full traceability for post-incident audit and compliance review.

Server Health and Uptime Management

A platform responsible for continuous network monitoring must itself maintain continuous availability. Dell Integrated Dell Remote Access Controller (iDRAC) provides out-of-band server management that operates independently of the host operating system and application stack. This independence is critical: if a software fault or GPU driver issue affects the monitoring application, iDRAC remains accessible for diagnostics and recovery.

iDRAC monitors the physical health of the Dell PowerEdge XE9785L server, including CPU and GPU temperatures, power supply status, fan speeds, memory integrity, and storage health. For the Multi-Agent Infrastructure Monitoring solution, this hardware-level visibility serves two functions.

First, iDRAC provides proactive alerting for server-side issues that could degrade monitoring performance. If a GPU accelerator begins exhibiting elevated temperatures or a power supply enters a degraded state, iDRAC generates alerts through standard protocols (SNMP, Redfish, email) before performance or agent responsiveness is affected. Operations teams can schedule maintenance during planned windows rather than responding to unplanned outages.

Second, iDRAC enables remote management for geographically distributed deployments. Telecom operators deploying the monitoring solution across multiple data centers or central offices can use iDRAC's remote console, firmware update, and power management capabilities without dispatching technicians. iDRAC's Redfish API also enables programmatic integration with existing IT service management (ITSM) platforms, providing a unified view of both the telecom infrastructure being monitored and the AI infrastructure performing the monitoring.

Safety and Human Oversight

While the multi-agent system operates autonomously for common failure modes with well-established remediation procedures, the architecture includes configurable approval gates for high-impact actions. Operators can configure the system to require human confirmation before executing actions that affect multiple base stations, modify network configurations, or trigger firmware changes across device groups. All automated actions are logged with full evidence chains, enabling post-incident review and continuous policy refinement.

This graduated autonomy model recognizes that telecom operators maintain safety-critical responsibilities. Routine issue resolution (single-link restarts, individual device resets) proceeds automatically. Actions with a broader blast radius (multi-device firmware rollbacks, network-wide configuration changes) pause for operator approval through the dashboard interface. Operations teams can adjust these thresholds as they gain confidence in the system's recommendations over time.

| Infrastructure Foundation

The agent workflows, inference throughput, and sub-two-minute remediation cycles described above place specific demands on the underlying hardware. The platform requires sustained GPU memory bandwidth for large-model inference, sufficient accelerator memory to host 235 billion parameters without multi-server distribution, and enough compute headroom for concurrent agent workloads alongside continuous log ingestion. The Dell PowerEdge XE9785L server equipped with AMD Instinct MI355X accelerators meets these requirements within a single-server footprint.

Component	Specification
Server Platform	Dell PowerEdge XE9785L Server
Form Factor	8U Rack Server
GPU Accelerators	8x AMD Instinct MI355X Accelerators
GPU Memory	2.3 TB aggregate HBM3e (288 GB per accelerator)
CPU	AMD EPYC™ Processor
Operating System	Ubuntu 22.04.5 LTS

Table 4 | Dell PowerEdge XE9785L Hardware Configuration

The AMD Instinct MI355X accelerator provides 288 GB of HBM3e memory per accelerator, a 50 percent increase over the 192 GB available on the previous-generation MI300X. This expanded capacity directly enables efficient deployment of the Qwen3-235B reasoning model. With 288 GB of HBM3e memory per accelerator, two MI355X GPUs can comfortably host the entire 235-billion parameter model without aggressive quantization or distribution across multiple servers which would introduce latency and operational complexity.

With 2.3 TB of aggregate GPU memory across eight accelerators, the XE9785L hosts both the reasoning model and embedding model simultaneously while providing capacity for concurrent agent workloads. Memory-intensive operations such as serving multiple model instances, fine-tuning, and executing concurrent reasoning agents are feasible on a single hardware system. This consolidation simplifies procurement, reduces data center footprint, and eliminates inter-server communication latency that would degrade real-time incident response performance.

The AMD EPYC processor handles CPU-bound preprocessing: event streaming through the Vector Data Pipeline, log transformation, and database operations. During peak ingestion periods when network events generate high volumes of telemetry data, the high-core-count processor prevents CPU bottlenecks from limiting pipeline throughput.

ROCm and the vLLM Inference Pipeline

The inference performance demonstrated in this paper depends on the tight integration between AMD ROCm and the vLLM serving framework. ROCm 7.0 provides the GPU kernel libraries, memory management primitives, and inter-GPU communication layers that vLLM uses to implement continuous batching and PagedAttention. For the Qwen3-235B model deployed across two MI355X accelerators, ROCm manages tensor parallel inference with minimal inter-GPU communication overhead, delivering the 7,773 tokens-per-second throughput measured at peak load.

ROCm's compatibility with the Hugging Face model format means that new models published to the Hugging Face Hub (and curated through Dell Enterprise Hub) can be deployed on the MI355X accelerators without format conversion or custom compilation steps. When the next generation of reasoning models becomes available, operators can evaluate them on existing hardware by updating the model weights in vLLM, with ROCm handling the low-level GPU resource management automatically. This upgrade path protects the hardware investment and ensures the monitoring solution can evolve as AI model capabilities advance.

Operational Ecosystem: From Deployment to Production

The Dell PowerEdge XE9785L server's value extends beyond raw compute performance. Three ecosystem capabilities elevate the server from a standalone inference platform into a managed, production-grade AI infrastructure component.

Dell Enterprise Hub, integrated with the Hugging Face model repository, provides a curated path from model selection to validated deployment. Operations teams select models from a catalog pre-validated against the XE9785L's hardware configuration, including MI355X memory capacity, ROCm version compatibility, and vLLM serving parameters. The platform generates deployment configurations and tracks model versions across the fleet, ensuring uniformity across multi-site telecom deployments and providing the change management audit trail that regulatory compliance requires.

Dell iDRAC delivers out-of-band server management that operates independently of the AI application stack. iDRAC continuously monitors GPU temperatures, power supply health, storage integrity, and fan performance, issuing proactive alerts before hardware issues impact monitoring capability. iDRAC's Redfish API enables integration with existing ITSM platforms, providing a unified view of both the telecom network being monitored and the AI infrastructure performing the monitoring. Its remote console and firmware management capabilities reduce the need for on-site technician visits, which is particularly valuable for deployments at edge locations or central offices with limited physical access.

AMD ROCm's open-source platform ensures that the entire inference stack remains auditable, portable, and free from proprietary lock-in. Models and pipelines built on standard frameworks run on ROCm without requiring framework-level modifications. Telecom security teams can inspect the GPU runtime codebase as part of their infrastructure certification process. When next-generation AMD Instinct accelerators become available, existing model deployments and serving configurations can be migrated forward without application-level changes, protecting the operator's multi-year infrastructure investment.

Approach Comparison

Capability	Manual NOC Operations	Cloud-Based AIOps	Multi-Agent on Dell PowerEdge
Detection Latency	Batch cycle (minutes to hours)	Near real-time (cloud dependent)	Continuous monitoring, sub-minute detection
Root Cause Analysis	Manual log correlation	AI-assisted, requires data upload	Autonomous, RAG-powered
Remediation	Manual execution	Recommendation with manual execution	Automated with audit trail
Data Sovereignty	On-premises	Data leaves perimeter	On-premises, fully controlled
Scalability	Linear staff increase	Cloud-elastic, variable cost	Single-server, deterministic cost

Table 5 | Monitoring Approach Comparison

| Performance Benchmarking

Performance validation demonstrates that the solution scales effectively across increasing network complexity while maintaining the throughput required for real-time incident detection and remediation. Testing measured inference throughput, event processing capacity, and GPU resource utilization under production-representative workloads.

Test Configuration

The team conducted benchmarks on a Dell PowerEdge XE9785L server equipped with eight AMD Instinct MI355X accelerators. The solution deployed the Qwen3-235B-A22B reasoning model using vLLM v0.10.1 optimized for AMD ROCm 7.0. Tests simulated realistic telecom monitoring scenarios by progressively increasing the number of monitored BBUs and associated RRHs to evaluate end-to-end system performance.

Testing targeted representative configurations for small (3 to 6 BBUs), medium (15 BBUs), and large (30 BBUs) deployments. Each configuration level ran under sustained load to capture steady-state performance, throughput and resource utilization characteristics.

Scalability Results

Event processing capacity scales near-linearly as monitoring scope increases. At maximum tested configuration of 30 BBUs and 90 RRHs, the system sustained a throughput of 2,253 events per minute while generating 7,773 tokens per second of inference throughput.

BBUs Monitored	RRHs Monitored	Events Processed/min	Throughput (tokens/sec)
3	9	539	2,231
15	45	1,515	6,823
30	90	2,253	7,773

Table 6 | Scalability Performance on Dell PowerEdge XE9785L with AMD Instinct MI355X

To put these numbers in operational context, the system resolves most incidents within minutes of detection, with average workflow durations of 102 to 463 seconds depending on complexity. Against the industry-estimated $100,000-per-hour cost of unplanned downtime cited earlier, even modest reductions in mean time to resolution translate directly into avoided revenue loss and reduced customer churn.

Generational Comparison: MI355X vs. MI300X

To quantify the generational improvement, the same Qwen3-235B-A22B-Thinking model was benchmarked on both MI355X and MI300X accelerators under identical workload conditions.

Figure 5 | Events/Minute Scalability

Figure 6 | Tokens/Second Throughput

At the maximum tested configuration of 30 BBUs, the MI355X achieves 40 percent higher inference throughput (7,773 vs. 5,476 tokens per second) and 50 percent greater event processing capacity (2,253 vs. 1,470 events per minute) compared to the MI300X. This performance gap widens at mid-range configurations: at 15 BBUs, the MI355X processes 57 percent more inference tokens per second (6,823 vs. 4,342) and nearly double the events per minute (1,515 vs. 766).

This generational improvement is attributable to architectural enhancements, most notably the MI355X's expanded 288 GB HBM3e per accelerator (versus 192 GB on the MI300X), which reduces the memory management overhead that constrains inference speed at scale. With more memory per GPU, tensor parallel inference across two accelerators operates with lower resource contention, enabling higher sustained throughput under concurrent agent workloads. For telecom operators evaluating infrastructure investments, these gains translate into expanded monitoring coverage and faster incident response within the same single-server footprint.

Latency Profile

End-to-end latency measurements confirm that the solution satisfies real-time operational requirements. Simple issue classification (such as confirming an INFO-level log requires no action) completes in approximately 45 seconds. Complex multi-component root cause analysis with automated remediation requires up to 463 seconds. The following metrics capture the range across all tested scenarios:

Average agent workflow duration: 102 to 463 seconds, depending on complexity
Minimum workflow completion: 45 seconds for straightforward issue classification
Time to first token (TTFT): 200 to 245ms for inference requests
P95 inference latency: 43 to 390 seconds depending on query complexity and concurrent load.

These latency figures align with the sub-two-minute remediation cycles demonstrated in single-incident scenarios, transforming a multi-hour outage into a sub-two-minute remediation event.

Limitations and Considerations

These benchmarking results reflect controlled test scenarios using simulated telecom log data. Production deployments may experience different throughput characteristics depending on log volume, event complexity, and the number of concurrent agent workflows.

The current implementation supports English-language log formats. Networks generating logs in other languages or non-standard formats may require additional parsing configuration.

Automated remediation actions demonstrated here (firmware rollback, link restart) represent common, well-understood failure modes. Complex multi-vendor interoperability issues may still require human escalation. The configurable approval gates described in the Safety and Human Oversight section give operators control over which actions proceed autonomously.

Finally, the 30-BBU configuration represents a large single-site deployment. At that scale, the inference engine queued 257 requests at peak, indicating that operators approaching this capacity should evaluate additional accelerator resources or model optimization strategies. Substantially larger networks should plan for multi-server scaling.

| Conclusion

Telecom operators face a fundamental choice: continue scaling manual processes that cannot keep pace with network complexity, or deploy autonomous systems that detect and resolve incidents faster than human teams can respond. The benchmarks and architecture presented in this paper demonstrate that the second path is now practical for production environments.

A coordinated team of specialized AI agents transforms network operations from reactive troubleshooting into continuous, proactive infrastructure management. These agents monitor 24/7 for link failures, synchronization issues, hardware malfunctions, and performance degradation without human initiation. When incidents occur, the Qwen3-235B reasoning model correlates current events with historical patterns retrieved from the vector database, delivering accurate root cause diagnoses in seconds. Common failure modes resolve in under two minutes, reducing mean time to resolution from hours to seconds.

Beyond incident response, the platform provides unified visibility through a single command center dashboard: geographic network visualization, live QoS metrics, event streams, and agent workflow status across all distributed base station infrastructure. Every automated action generates a complete audit trail, supporting compliance requirements and enabling continuous improvement through post-incident review. Dell iDRAC ensures the monitoring platform itself maintains the uptime that telecom operations demand, with out-of-band health management, proactive alerting, and remote administration.

The Dell PowerEdge XE9785L server with AMD Instinct MI355X accelerators provides the memory and compute density to run these workloads entirely on premises. Organizations can deploy a frontier-scale reasoning model alongside embedding models and concurrent agent workflows on a single server. This on-premises architecture eliminates cloud dependencies and external API calls that would introduce latency and data sovereignty concerns.

As network traffic grows and 5G deployments expand, the gap between manual monitoring capabilities and operational demands continues to widen. Organizations that invest in autonomous monitoring infrastructure now position themselves for higher service quality, lower operational costs, and faster response to network events as that gap accelerates.

To learn more about implementing this solution, contact Dell Technologies or request access to reference code at contact@metrum.ai.

| Addendum: Key Concepts

What is RAG, and why is it critical for enterprises?

Retrieval-Augmented Generation (RAG) is a method in natural language processing that enhances the generation of responses by incorporating external knowledge retrieved from a large corpus or database. This approach combines the strengths of retrieval-based models and generative models to deliver more accurate, informative, and contextually relevant outputs.

The key advantage of RAG is its ability to dynamically leverage external knowledge, allowing the model to generate responses informed not only by its training data but also by up-to-date and detailed information from the retrieval phase. This makes RAG particularly valuable in applications where factual accuracy and comprehensive details are essential, such as in network operations, incident management, and other fields that require precise information. RAG provides enterprises with a powerful tool for improving the accuracy, relevance, and efficiency of their information systems.

Why is Dell PowerEdge XE9785L with AMD Instinct MI355X well-suited for RAG solutions?

Designed especially for AI tasks, the Dell PowerEdge XE9785L server is a powerful data-processing server equipped with high-density GPU accelerator support (such as eight MI355X GPUs) and high-performance system architecture, making it well-suited for AI workloads involving training, fine-tuning, and conducting inference with large language models.

Effectively implementing RAG solutions requires robust hardware infrastructure that can handle both the retrieval and generation components. Key hardware features for RAG solutions include high-performance accelerator units and large memory and storage capacity. With 288 GB of HBM3e memory per GPU, a single AMD Instinct MI355X accelerator can host very large LLMs and their associated working memory. Optimized for generative AI, the MI355X accelerator delivers leadership AI/HPC performance and provides the memory bandwidth and compute density needed to drive high-throughput inference and generation in RAG pipelines.

What are Multi-Specialist Agents and Multi-Agent Frameworks?

Multi-Specialist Agents are domain-focused AI agents designed with specialized expertise to address distinct aspects of complex operational workflows. Each agent operates autonomously within its area of specialization, such as network diagnostics, hardware health, communication link analysis, or report generation, while coordinating with other agents to achieve a shared operational goal. These agents use reasoning models, contextual data retrieval, and adaptive decision-making to analyze issues, execute corrective actions, and generate insights in real time.

A Multi-Agent Framework refers to a coordinated system where multiple specialist agents collaborate dynamically to solve interrelated problems across different domains. In this framework, agents communicate, delegate tasks, and share context through structured workflows, ensuring that each task is handled by the most capable specialist. For example, in the telecom C-RAN monitoring solution, the Operations Manager Agent delegates tasks to domain-specific agents such as the NOC Analyst, Communication Link Monitor, Hardware Health Agent, and Reporting Agent.

By combining the intelligence of multiple specialized agents, the Multi-Agent Framework enables autonomous detection, analysis, and resolution of incidents across large-scale infrastructures. It ensures faster root-cause identification, reduced downtime, and comprehensive reporting through continuous collaboration and reasoning between agents. This architecture represents a key advancement toward self-governing AI systems capable of managing complex, real-time operational environments.

System Under Test

Component	Detail
Server Platform	Dell PowerEdge XE9785L Server
GPU Accelerators	8x AMD Instinct MI355X Accelerator (288 GB HBM3e each)
CPU	AMD EPYC Processor (high core count)
Operating System	Ubuntu 22.04.5 LTS
Hardware Optimization	AMD ROCm 7.0
Inference Runtime	vLLM v0.10.1
Reasoning Model	Qwen3-235B-A22B-Thinking
Embedding Model	bge-large-en
Vector Database	PgVector + PostgreSQL
Time-Series Database	GreptimeDB
Agent Framework	AutoGen (Microsoft)

Table 8 | System Under Test Configuration

Glossary of Technical Terms

Term	Definition
A2A	Agent-to-Agent protocol for secure, structured communication between AI agent microservices
AutoGen	Microsoft's open-source multi-agent orchestration framework for coordinating AI agent workflows
BBU	Baseband Unit; centralized equipment for baseband signal processing in C-RAN architectures
bge-large-en	An open-source text embedding model used for semantic search and similarity matching
C-RAN	Cloud Radio Access Network; architecture that centralizes baseband processing while distributing radio units
GreptimeDB	An open-source distributed time-series database optimized for high-frequency telemetry data
HBM3e	High Bandwidth Memory 3e; latest generation high-bandwidth memory for GPU accelerators
iDRAC	Integrated Dell Remote Access Controller; out-of-band server management platform
MCP	Model Context Protocol; standardized interface for context sharing across AI agents
NVMe	Non-Volatile Memory Express; high-speed storage interface protocol
OSS	Operations Support Systems; software tools for network monitoring, fault management, and performance optimization
PagedAttention	Memory management technique for efficient GPU memory allocation during LLM inference
PgVector	A vector similarity search extension for PostgreSQL databases
QoS	Quality of Service; performance metrics ensuring network meets service level requirements
RAG	Retrieval-Augmented Generation; method combining document retrieval with AI text generation
ROCm	Radeon Open Compute platform; AMD's open-source GPU computing software platform
RRH	Remote Radio Head; distributed equipment handling RF processing at cell sites
TTFT	Time to First Token; latency measure for the initial response from an LLM inference request
vLLM	Open-source high-throughput inference engine for serving large language models

Table 9 | Glossary

| References

[1] Ericsson, "Ericsson Mobility Report, November 2024," Ericsson AB, Stockholm, Sweden, Nov. 2024. [Online]. Available: https://www.ericsson.com/en/reports-and-papers/mobility-report

[2] TM Forum, "Network Performance Benchmarking Report," TM Forum, 2024. See also: Analysys Mason, "Telecoms Network Downtime: Cost and Impact Analysis," Analysys Mason Ltd., London, U.K., 2023.

Image Sources

Dell Images: © Dell Technologies Inc. Dell PowerEdge XE9785L Server. Image source: Dell DAM via Dell.com

AMD Images: © AMD Inc. AMD Instinct MI355X Accelerator. Image source: AMD Media Library (https://library.amd.com)

Copyright © 2026 Metrum AI, Inc. All Rights Reserved. This project was commissioned by Dell Technologies. Dell and other trademarks are trademarks of Dell Inc. or its subsidiaries. AMD, AMD Instinct™, AMD ROCm™, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other product names are the trademarks of their respective owners.

***DISCLAIMER - Performance varies by hardware and software configurations, including testing conditions, system settings, application complexity, the quantity of data, batch sizes, software versions, libraries used, and other factors. The results of performance testing provided are intended for informational purposes only and should not be considered as a guarantee of actual performance.

Tab 1

Enhancing Telecom Quality of Service with Gen AI–Based Multi-Agent Infrastructure Monitoring

This blog presents a telecom Multi-Agent Infrastructure Monitoring solution powered by the AMD Instinct MI355X accelerators on Dell PowerEdge XE9785LL servers.

October 2025

The introduction of the AMD Instinct MI355X accelerator, now integrated with Dell’s flagship PowerEdge XE9785LL server, provides a robust platform for high-performance AI applications. Leveraging this powerful combination, we have developed a Multi-Agent Infrastructure Monitoring solution to demonstrate the value of Generative AI in optimizing network operations for telecom companies and their enterprise clients. This solution is crucial for minimizing network downtime, ensuring consistent service quality, and enabling telecom operators to make informed decisions about network investments and improvements.

In this blog, we offer insights into the solution architecture built with industry-leading software and hardware components, showcasing the following:

How to utilize LLM-based Multi-Agent framework to build a telecom C-RAN monitoring solution
How to deploy a cutting-edge reasoning capable model, embeddings model, and vector database on a Dell PowerEdge XE9785LL server equipped with eight AMD Instinct MI355X accelerators
How to navigate an up-to-date and intelligent dashboard that tracks and recommends actions for telecom issues, with a focus on base station performance

Understanding the Telecom Landscape

The expected data volume over cellular networks is projected to exceed hundreds of exabytes per month, driven by human and machine data, representing tens of billions of devices. This explosive growth poses a challenge in ensuring that networks remain robust and resilient while handling increased demand for data throughput and low-latency services. This creates challenges in maintaining high-quality service and securing critical information. Telecom infrastructure providers must ensure the integrity and resilience of their networks to prevent costly disruptions. For example, network downtime caused by equipment failure at a Remote Radio Head (RRH) can lead to widespread service outages, particularly in densely populated areas. Even worse, cascading failures can occur when a single failure triggers a chain reaction, affecting multiple base stations across the network. With tens of thousands of RRHs deployed across geographically dispersed areas, addressing these issues manually—such as sending out repair teams—can be time-sensitive and extremely costly.

Other common challenges include signal degradation due to environmental factors, misconfigurations in the Baseband Unit (BBU), and congestion from unexpected traffic spikes. These problems not only affect the quality of service but also increase the risk of security vulnerabilities, making real-time monitoring and automated issue resolution critical for minimizing service disruptions and maintaining operational efficiency.

To better understand the impact of this solution, it is important to gain some foundational knowledge on the telecom landscape (as represented in the simplified diagram above):

RRH (Remote Radio Head):

RRHs are deployed across multiple sites to perform basic signal transmission and reception functions, and handle radio frequency (RF) processing at cell sites.

BBU (Baseband Unit):

BBUs are aggregated within a centralized BBU pool to provide robust computing capabilities for baseband signal processing. They handle the digital processing of signals.

OSS (Operations Support Systems):

OSS comprises software tools that analyze and manage the telecom network, including network monitoring, fault management, and performance optimization.

In a typical telecom network, RRHs are connected to the BBU pool through fronthaul links. BBUs can be dynamically assigned to serve clusters of RRHs in a many-to-one configuration. The OSS provides a comprehensive network view by integrating data from RRHs and BBUs, ensuring that the telecom network meets Quality of Service (QoS) requirements.

In this implementation, we simulate a telecom network with several RRHs, a centralized BBU pool, and an OSS using synthetic data streams, specifically system logs from various telecom devices and OSS event logs. The following section provides a detailed walkthrough of the solution architecture and user interface. In doing so, we demonstrate how telecommunications service providers can use agentic RAG solutions to analyze and address quality of service and security challenges in near real-time, increasing the uptime of their networks and improving the quality of service.

Solution Architecture

To power this solution, we selected the Dell PowerEdge XE9785LL equipped with AMD Instinct MI355X accelerators due to its exceptional performance and memory capacity, crucial for handling the latest high parameter count large language models. With 288GB of HBM3E memory per accelerator, we can comfortably run the entire Qwen3 235B reasoning model on two accelerators.. Memory- and compute-intensive workloads, such as serving multiple model instances, fine-tuning, and executing concurrent reasoning agents, are also possible using a single hardware system with eight accelerators.

To deliver an industry specific solution, we paired a large language reasoning model with critical software components as shown in the architecture below. The memory and performance capabilities of Dell PowerEdge XE9785LL with AMD Instinct MI355X accelerators make it possible to concurrent reasoning agents workloads without bottlenecks crucial for telecom-grade, low-latency environments.

This solution leverages the following technologies:

Multi-Agent Reasoning Framework: Powered by the AutoGen Agent ecosystem, the solution supports autonomous collaboration between specialized agents. These agents use the Model Context Protocol (MCP) and A2A protocol for seamless communication and orchestration across microservices, enabling dynamic reasoning and adaptive task management.
Real-Time Data Processing and Ingestion: A Vector Data Pipeline ingests and transforms data from telecom event sources. The transformed data is routed to GreptimeDB, a high-performance time-series database.
Optimized AI Model Stack:

bge-large for high-quality text embeddings and similarity search
Qwen-235B for generation and reasoning tasks
vLLM (v0.10.1) with AMD ROCm 7.0 for highly efficient inference runtime

Integrated Databases and Observability: Postgres DB handles structured metadata and agent state management, while an integrated Dashboard provides system observability, agent telemetry, and performance analytics.
Extensible and Scalable Architecture: The architecture is designed to scale efficiently capable of processing thousands of network messages and log streams per day across hundreds of nodes on a single server, while supporting horizontal scaling for expanded deployments.

To enable these features, the software stack includes the following key components:

vLLM Inference Runtime (v0.10.1 with ROCm 7.0), an industry-standard library for optimized open-source large language model (LLM) serving, with support for AMD ROCm 7.0.
Autogen based Multi-Agent Framework,serves as the orchestration layer for managing multi-agent collaboration and coordination. It enables asynchronous, event-driven task execution
Model Context Protocol (MCP) and A2A Agent Services, The Model Context Protocol (MCP) acts as a standardized interface for context sharing and control across multiple reasoning and tool-using agents.

Paired with the A2A (Agent-to-Agent) protocol, this component ensures secure, scalable, and structured communication between microservices and agents.

Together, they form the control plane of the solution, handling request routing, inter-agent orchestration, and operational telemetry crucial for running distributed AI workflows with traceability and observability.

Reasoning and Generation Models(Qwen3 235B Thinking), an industry-leading open-weight language model with 235 billion parameters, served using vLLM with AMD ROCm optimizations.
bge-large-en embeddings model, one of the top ranked text embeddings models running with Hugging Face APIs.
Database Layer, The PgVector extension for PostgreSQL provides vector storage and similarity search capabilities, enabling embedding-based retrieval and semantic querying.

It works in conjunction with Postgres, which serves as a metadata layer, managing agent state, session data, and system configurations.

Vector Data Pipeline, Handles real-time ingestion and transformation of telemetry and network logs before routing enriched data to GreptimeDB.
GreptimeDB, is a distributed time-series database optimized for handling high-frequency telemetry and event data.

Solution Overview

This solution is centered on real-time monitoring and management of telecom services by leveraging a multi-agent framework composed of specialized AI agents capable of autonomously detecting, analyzing, and resolving telecom network incidents in real time. The image below depicts the user interface of the Multi-Agent C-RAN Infrastructure Monitoring solution, through which users can monitor live QoS metrics, visualize incident resolutions executed by agents across base stations, and generate both detailed incident-specific reports and executive summary reports that capture overall network health.

Base Station Simulated Logs

Logs are generated from each base station’s RRH (Remote Radio Head) and BBU (Baseband Unit). The color indicators represent operational status:

Green: All [INFO] logs — RRHs are operating normally.
Orange: Early warning detected — agents monitoring for potential degradation.
Red: Presence of [WARNING] or [ERROR] logs — triggers automated incident detection and resolution workflow.

QoS Metrics(Top Section)

Live Global QoS metrics are continuously streamed and visualized, reflecting:

Block Rate, Data Drop, and Call Drop percentages
Downlink and Uplink Throughput levels
Real-time updates aggregated from all active base stations

Agents monitor deviations and correlate anomalies with underlying network or hardware events.

Event Streams & Active Agents (Middle Section)

This section shows:

Real-time event count from BBU, RRH, and OSS logs
Active agent workflows currently diagnosing or resolving issues
The number of processed incidents and devices monitored

Each workflow includes specialized agents such as:

Operations Manager Agent – Delegates the task to domain-specific agents.
NOC Analyst Agent – Monitors the BBU and RRH for Issues and forwards it to the Operations Manager
Communication Link Monitor Agent – Specialized in resolving communication link failures between BBU and RRH
Synchronization Specialist – Specialized in resolving synchronization timing issues in telecom infrastructure.
Hardware Health Agent – Specialized in resolving hardware failure issues between BBU and RRH
Reporting Agent – Specialized in generating comprehensive reports for telecom infrastructure operations.

Automated Incident Detection and Resolution

When [WARNING] or [ERROR] logs are detected:

The Operations Manager Agent delegates the task to appropriate agents.
Agents collaboratively identify the root cause (e.g., link failure, hardware malfunction, or synchronization issue).
Corrective actions such as restarting communication links or reconfiguring network interfaces are executed autonomously.
The system updates the incident state from “Detected” → “Resolved” in real time.

Issue Alerts Panel

Displays an interactive list of incidents:

Mapping between affected RRHs, BBUs, and base stations
Visual indicators showing ongoing, resolved, or escalated issues
Direct drill-down into root cause analysis reports for each event

Report Generation

The system supports two modes of report generation:

Specific Incident Report: Generated per event with detailed RCA (Root Cause Analysis), corrective actions taken, and affected components.
Executive Summary Report: Provides aggregated insights across all base stations, summarizing trends, issue types, and overall service impact.

The image above illustrates each segment of the workflow, and details how the RAG-based agentic workload and vector database interact with the simulated telecom network data.

As demonstrated in our implementation, telecom operators can now leverage Generative AI to enhance the monitoring and management of telecom services, while ensuring the privacy of their proprietary data and workflows. Dell’s flagship PowerEdge XE9785LL server, equipped with eight AMD Instinct MI355X accelerators, provides the necessary memory footprint to support these rich multimodal data and model-intensive use cases.

In this blog, we demonstrated how enterprises deploying applied AI can leverage their proprietary data to harness multimodal RAG capabilities in the context of a telecom issue management tool. We explored the capabilities of the Dell PowerEdge XE9785LL server equipped with AMD Instinct MI355X accelerators, achieving the following milestones:

Developed a Multi-Agent C-RAN Infrastructure Monitoring solution.
Deployed cutting-edge language model, embeddings model, and vector database on Dell PowerEdge XE9785LL server with eight AMD Instinct MI355X accelerators.
Integrated an intelligent, real-time dashboard that monitors and executes recommended actions for telecom issues, with a focus on base station performance.

To learn more, please request access to our reference code by contacting us at contact@metrum.ai.

Additional Criteria for IT Decision Makers

| What is RAG, and why is it critical for enterprises?

Retrieval-Augmented Generation (RAG), is a method in natural language processing (NLP) that enhances the generation of responses or information by incorporating external knowledge retrieved from a large corpus or database. This approach combines the strengths of retrieval-based models and generative models to deliver more accurate, informative, and contextually relevant outputs.

The key advantage of RAG is its ability to dynamically leverage a large amount of external knowledge, allowing the model to generate responses that are informed not only based on its training data but also by up-to-date and detailed information from the retrieval phase. This makes RAG particularly valuable in applications where factual accuracy and comprehensive details are essential, such as in customer support, academic research, and other fields that require precise information.

Ultimately, RAG provides enterprises with a powerful tool for improving the accuracy, relevance, and efficiency of their information systems, leading to better customer service, cost savings, and competitive advantages.

| Why is the Dell PowerEdge XE9785LL Server with AMD Instinct MI355X Accelerators well-suited for RAG Solutions?

Designed especially for AI tasks, the Dell PowerEdge XE9785LL server is a powerful data-processing server equipped with high-density GPU accelerator support (such as eight MI350-series GPUs) and high-performance system architecture, making it well-suited for AI workloads, especially for those involving training, fine-tuning, and conducting inference with large language models.

Effectively implementing Retrieval-Augmented Generation (RAG) solutions requires a robust hardware infrastructure that can handle both the retrieval and generation components. Key hardware features for RAG solutions include high-performance accelerator units and large memory and storage capacity. With 288 GB of HBM3E memory per GPU, a single AMD Instinct MI355X accelerator can host very large LLMs and their associated working memory. Optimized for generative AI, the MI355X accelerator delivers leadership AI/HPC performance and provides the kind of memory bandwidth and compute density needed to drive high-throughput inference and generation in RAG pipelines. Paired in a server like the Dell PowerEdge XE9785LL (with support for eight MI355X GPUs, fast NVMe storage, and high-speed networking), this creates an ideal platform to power both the retrieval of knowledge (embeddings, vector search) and the generation of context-aware responses in real time.

| What are Multi-Specialist Agents, and what is a Multi-Agent Framework?

Multi-Specialist Agents are domain-focused AI agents designed with specialized expertise to address distinct aspects of complex operational workflows. Each agent operates autonomously within its area of specialization—such as network diagnostics, hardware health, communication link analysis, or report generation—while coordinating with other agents to achieve a shared operational goal. These agents use reasoning models, contextual data retrieval, and adaptive decision-making to analyze issues, execute corrective actions, and generate insights in real time.

A Multi-Agent Framework refers to a coordinated system where multiple specialist agents collaborate dynamically to solve interrelated problems across different domains. In this framework, agents communicate, delegate tasks, and share context through structured workflows, ensuring that each task is handled by the most capable specialist. For example, in the telecom C-RAN monitoring solution, the Operations Manager Agent delegates tasks to domain-specific agents such as the NOC Analyst, Communication Link Monitor, Hardware Health Agent, and Reporting Agent.

By combining the intelligence of multiple specialized agents, the Multi-Agent Framework enables autonomous detection, analysis, and resolution of incidents across large-scale infrastructures. It ensures faster root-cause identification, reduced downtime, and comprehensive reporting through continuous collaboration and reasoning between agents. This architecture represents a key advancement toward self-governing AI systems capable of managing complex, real-time operational environments.

References

Copyright © 2025 Metrum AI, Inc. All Rights Reserved. This project was commissioned by Dell Technologies. Dell and other trademarks are trademarks of Dell Inc. or its subsidiaries. AMD, AMD Instinct™, AMD ROCm™, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other product names are the trademarks of their respective owners.

[1] Ericsson, "Ericsson Mobility Report, November 2024," Ericsson AB, Stockholm, Sweden, Nov. 2024. [Online]. Available: https://www.ericsson.com/en/reports-and-papers/mobility-report

[2] TM Forum, "Network Performance Benchmarking Report," TM Forum, 2024.

Analysys Mason, "Telecoms Network Downtime: Cost and Impact Analysis," Analysys Mason Ltd., London, U.K., 2023.