Executive Summary
Mobile network data traffic is projected to reach several hundred exabytes per month by 2030, driven by an estimated 30+ billion connected devices generating both human and machine data.[1] As this growth accelerates, telecom operators face a widening gap between the speed at which equipment fails and the pace at which teams can troubleshoot manually. Cascading faults across tens of thousands of geographically dispersed Remote Radio Heads (RRHs) can escalate into widespread outages before operations teams identify a root cause. This paper presents a Multi-Agent Infrastructure Monitoring solution, built on Dell PowerEdge XE9785L servers with AMD Instinct MI355X accelerators, that closes that gap through autonomous, on-premises AI.
The platform ingests system logs from Baseband Units (BBUs), RRHs, and Operations Support Systems (OSS) in real time. A coordinated suite of specialized AI agents, powered by the Qwen3-235B reasoning model, continuously detects anomalies, determines root causes, and initiates remediation, replacing multi-hour manual troubleshooting cycles with automated resolution typically completing in two to eight minutes under concurrent load, and in under 90 seconds for isolated incidents.
Key Results at a Glance
| Sub-90-Second Remediation | Detection to resolution in under 90 seconds for isolated faults |
| 1.5x to 2x Faster Event Processing | MI355X vs MI300X across tested configurations (3 to 30 BBUs) |
| 2.3 TB Combined GPU Memory | Single 8-GPU node, no multi-node sharding |
| Up to 1.6x Generational Throughput Gain | MI355X vs MI300X inference tokens per second at matched concurrency |
| 7,773 Tokens/sec Peak Throughput | At 30 BBUs / 90 RRHs max tested config |
| 2,253 Events/min at Scale | Sustained processing at max tested configuration |
The Telecom Infrastructure Challenge

Figure 1 | The Telecom Challenge
Telecom operators manage increasingly complex infrastructure under intensifying pressure as 5G deployments accelerate and network scope expands. In a typical Cloud Radio Access Network (C-RAN) architecture, RRHs handle radio frequency processing at cell sites while BBUs provide centralized baseband signal processing. OSS platforms integrate data from both components to monitor network health and maintain Quality of Service (QoS) requirements. When equipment fails or performance degrades, operations teams must rapidly identify affected components, determine root causes, and execute corrective actions before service quality suffers. Industry analyses estimate that unplanned network downtime costs Tier 1 operators upward of $100,000 per hour in lost revenue and customer churn.[2]
C-RAN Network Primer
- RRH (Remote Radio Head): Distributed across multiple cell sites to perform radio signal transmission and reception. Handles radio frequency (RF) processing at the network edge.
- BBU (Baseband Unit): Consolidated within a centralized processing pool to deliver high-performance baseband computation. Handles digital signal processing.
- OSS (Operations Support Systems): Software tools that analyze and manage the telecommunications network, including network monitoring, fault management, and performance optimization.
In a typical network, RRHs connect to the BBU pool through fronthaul links. BBUs can be dynamically assigned to serve clusters of RRHs in a many-to-one configuration.
Legacy monitoring approaches often fail to keep pace with the scale and velocity of modern networks. Manual log analysis cannot process the sheer volume of telemetry data generated by thousands of distributed network elements. Batch processing systems introduce delays that allow issues to escalate before detection. Siloed monitoring tools make it difficult to correlate events across BBUs, RRHs, and OSS platforms. These operational gaps carry measurable consequences: extended mean time to resolution, increased service disruptions, and elevated operational costs associated with dispatching field technicians for issues that could be diagnosed remotely.
The table below highlights the primary operational challenges that prevent network operations teams from achieving continuous situational awareness.
| Challenge | Current State | Business Impact |
|---|---|---|
| Delayed Detection | Batch systems and manual log reviews run on multi-hour cycles | Equipment failures escalate to widespread outages before identification |
| Cascading Failures | Single-point failures propagate across interconnected base stations | Localized issues become network-wide service disruptions |
| Manual Root Cause Analysis | Engineers manually correlate logs across BBU, RRH, and OSS systems | Extended mean time to resolution increases customer impact |
| Signal Degradation | Environmental factors and misconfigurations cause gradual performance decline | QoS deteriorates before thresholds trigger alerts |
| Geographic Distribution | Tens of thousands of RRHs deployed across dispersed locations | Dispatching field technicians for remote diagnostics adds cost and delays resolution |
Table 1 | Operational Challenges and Business Impact
Addressing these challenges requires a platform purpose-built for continuous, automated network intelligence. The following sections describe how the Multi-Agent Infrastructure Monitoring solution transforms fragmented troubleshooting workflows into a unified, real-time operations capability.
Solution Overview
At the core of this solution, a coordinated team of specialized AI agents autonomously monitors network health, detects anomalies, and executes remediation actions. When new telemetry data arrives from BBUs, RRHs, or OSS platforms, the system uses large language models to analyze log patterns, classify severity levels, and identify affected components. Specialized agents then collaborate to determine root causes and initiate or recommend corrective actions aligned with the specific issue type.
| Agent | Function | Operation Mode |
|---|---|---|
| Operations Manager Agent | Delegates tasks to domain-specific agents based on issue classification | Continuous |
| NOC Analyst Agent | Monitors BBU and RRH logs for anomalies and forwards issues to Operations Manager Agent | Continuous |
| Communication Link Monitor | Resolves communication link failures between BBU and RRH components | Event-triggered |
| Synchronization Specialist | Resolves timing and synchronization issues in telecom infrastructure | Event-triggered |
| Hardware Health Agent | Resolves hardware failure issues affecting BBU and RRH equipment | Event-triggered |
| Reporting Agent | Generates incident reports and executive summaries for operations review | User-initiated |
Table 2 | Multi-Agent Functions
Each agent maps to a specific operational gap identified in Table 1, ensuring that no category of network incident goes unaddressed.
Solution Flow

Figure 2 | Solution Flow
The platform operates through four integrated stages, each running continuously within the telecom operator's infrastructure. In the first stage, the Vector Data Pipeline ingests telemetry from multiple source types: system logs from BBUs and RRHs; event logs and QoS metrics from OSS platforms; and operational documents containing configuration and procedural information. A Kafka-based event streamer normalizes these inputs and routes them to the processing layer.
The second stage transforms raw data into searchable knowledge. The bge-large-en text embedding model converts log entries and documents into vector representations, stored in PgVector (a vector similarity search extension for PostgreSQL) for semantic search. Time-series telemetry flows to GreptimeDB (an open-source distributed time-series database) for temporal analysis. This dual-database approach enables agents to query both semantic similarity (finding related past incidents) and temporal patterns (identifying performance trends).
The third stage executes the agent workflows. When the NOC Analyst Agent detects a warning or error condition in incoming logs, it forwards the issue to the Operations Manager Agent. The Operations Manager Agent classifies the problem type and delegates to the appropriate specialist. The Communication Link Monitor handles connectivity failures, the Synchronization Specialist addresses timing issues, and the Hardware Health Agent manages equipment malfunctions. Each specialist agent queries the vector database for similar past incidents and uses the Qwen3-235B reasoning model to determine root cause and corresponding corrective actions.
Dashboard Capabilities

Figure 3 | Unified Command Center Dashboard
The final stage delivers actionable intelligence through a unified dashboard. The command center interface provides interactive Leaflet maps with geographic network visualization, real-time event streams, an agent chat interface, and GPU cluster monitoring within a single dashboard. Key components include:
- Base Station Status: Network operations staff see color-coded health indicators at a glance, eliminating the need to parse raw log files during active incidents. Green indicates normal operations, orange signals early warnings where agents are monitoring for potential degradation, and red indicates active issues that trigger automated resolution workflows.
- Global QoS Metrics: Operations teams monitor live performance without switching between multiple tools. The dashboard streams block rate, data drop, and call drop percentages alongside downlink and uplink throughput levels. Agents automatically correlate anomalies with underlying network or hardware events.
- Event Streams and Active Agents: Supervisors track workload distribution across the agent team in real time. The panel displays event counts from BBU, RRH, and OSS logs alongside active agent workflows, processed incidents, and monitored devices.
- Issue Alerts Panel: Engineers drill down from high-level alerts to root cause analysis without context switching. The interactive incident list maps affected RRHs, BBUs, and base stations with visual indicators for ongoing, resolved, or escalated issues.
- Report Generation: Compliance and operations teams export documentation for audits and post-incident reviews. Specific Incident Reports deliver detailed root cause analysis (RCA) with corrective actions and affected components. Executive Summary Reports aggregate insights across all base stations. Both formats export as JSON or PDF.
Example Scenario: Fronthaul Link Failure
To illustrate how these components work together, consider a fronthaul link failure scenario. At 2:14 AM, the communication link between BBU-12 and RRH-47 fails due to a firmware mismatch introduced during a routine update. Within seconds, the NOC Analyst Agent detects the ERROR log in the incoming telemetry stream and forwards it to the Operations Manager Agent. The Operations Manager classifies the issue as a connectivity failure and delegates to the Communication Link Monitor.
The Communication Link Monitor queries the vector database for similar past incidents and identifies three prior cases where firmware version mismatches caused identical symptoms. Using the Qwen3-235B reasoning model, the agent confirms the root cause, determines that RRH-47 requires a firmware rollback, and initiates an automated link restart sequence. The entire detection-to-remediation cycle completes in less than 90 seconds under moderate, single-incident load.
This 90-second cycle represents a single-incident response under moderate load. Under production conditions with concurrent monitoring of 30 BBUs and 90 RRHs, average workflow durations range from 102 to 463 seconds depending on issue complexity, as detailed in the Performance Benchmarking section.
Without the multi-agent system, this incident would typically wait for the next batch processing cycle, potentially leaving customers without service for hours. Operators might also need to dispatch field technicians, adding cost and further extending resolution time. With the multi-agent system, the autonomous workflow compresses a multi-hour outage into a remediation event completed in under two minutes.
Solution Architecture
The architectural decisions behind this solution reflect a fundamental requirement: every component must deliver the throughput needed for real-time network intelligence while operating entirely within the telecom operator's secure infrastructure. The platform combines optimized inference runtimes and a modular software stack designed for continuous operation at telecom-grade reliability.

Figure 4 | Solution Architecture
Software Stack
The software architecture layers optimized runtimes atop the AMD Radeon Open Compute platform (ROCm) 7.0. vLLM provides the inference runtime, delivering high-throughput token generation through continuous batching and PagedAttention memory management. This combination enables the solution to serve multiple concurrent agent requests while maintaining consistent, predictable latency for time-sensitive incident response.
Model Provisioning and Lifecycle Management
Production AI model deployment in telecom environments demands a streamlined path from selection through validation to runtime serving. This solution provisions the Qwen3-235B reasoning model and bge-large-en embedding model through Dell Enterprise Hub, integrated with the Hugging Face model repository. Dell Enterprise Hub provides pre-validated configurations optimized for Dell PowerEdge servers with AMD Instinct accelerators, ensuring compatibility across the hardware and software stack. The operational benefits of this approach are detailed in the Operational Ecosystem section.
For telecom operators managing multiple NOC sites, this centralized model management approach ensures consistency across deployments: every site runs the same validated model version with the same serving configuration, reducing the risk of inconsistent agent behavior across the network.
| Layer | Component | Function |
|---|---|---|
| Hardware Optimization | AMD ROCm 7.0 | GPU compute and memory management |
| Inference Runtime | vLLM v0.10.1 | High-throughput model serving with continuous batching |
| Agent Framework | AutoGen (Microsoft) | Multi-agent orchestration and asynchronous task execution |
| Agent Communication | MCP + A2A Protocol | Context sharing and inter-agent coordination |
| Reasoning Model | Qwen3-235B-A22B-Thinking | Root cause analysis and corrective action determination |
| Embedding Model | bge-large-en | Text embeddings for semantic search and similarity matching |
| Vector Database | PgVector + PostgreSQL | Vector storage, similarity search, and metadata management |
| Time-Series Database | GreptimeDB | High-frequency telemetry and event data storage |
| Event Streaming | Vector Data Pipeline | Real-time ingestion, transformation, and routing |
Table 3 | Software Stack Components
Agent Orchestration
The AutoGen framework (Microsoft's open-source multi-agent orchestration framework) coordinates the specialized agents that execute the monitoring and remediation workflows. The Model Context Protocol (MCP) acts as a standardized interface for context sharing across agents, while the Agent-to-Agent (A2A) protocol ensures secure, structured communication between microservices. Together, these components form the control plane that handles request routing, inter-agent orchestration, and operational telemetry for distributed AI workflows with complete traceability and observability.
Each agent accesses shared resources through well-defined interfaces: the vector database for semantic search across historical incidents, GreptimeDB for time-series telemetry analysis, and the model inference endpoints for AI-powered reasoning. This modular architecture enables independent scaling of individual components, such as adding embedding model replicas during peak event volumes, without disrupting active monitoring workflows. On a single server, the architecture processes thousands of network messages and log streams per day across hundreds of nodes, while supporting horizontal scaling for larger deployments.
The orchestration layer also handles agent failure recovery. If a specialist agent encounters an error during root cause analysis, the Operations Manager Agent reassigns the task to a backup agent or escalates to the reporting layer with a partial analysis. This fault-tolerant design ensures that a single agent failure does not leave a network incident unaddressed. All inter-agent messages and task handoffs are logged, providing full traceability for post-incident audit and compliance review.
Server Health and Uptime Management
A platform responsible for continuous network monitoring must itself maintain continuous availability. Dell Integrated Dell Remote Access Controller (iDRAC) provides out-of-band server management that operates independently of the host operating system and application stack. This independence is critical: if a software fault or GPU driver issue affects the monitoring application, iDRAC remains accessible for diagnostics and recovery.
iDRAC monitors the physical health of the Dell PowerEdge XE9785L server, including CPU and GPU temperatures, power supply status, fan speeds, memory integrity, and storage health. For the Multi-Agent Infrastructure Monitoring solution, this hardware-level visibility serves two functions.
First, iDRAC provides proactive alerting for server-side issues that could degrade monitoring performance. If a GPU accelerator begins exhibiting elevated temperatures or a power supply enters a degraded state, iDRAC generates alerts through standard protocols (SNMP, Redfish, email) before performance or agent responsiveness is affected. Operations teams can schedule maintenance during planned windows rather than responding to unplanned outages.
Second, iDRAC enables remote management for geographically distributed deployments. Telecom operators deploying the monitoring solution across multiple data centers or central offices can use iDRAC's remote console, firmware update, and power management capabilities without dispatching technicians. iDRAC's Redfish API also enables programmatic integration with existing IT service management (ITSM) platforms, providing a unified view of both the telecom infrastructure being monitored and the AI infrastructure performing the monitoring.
Safety and Human Oversight
The multi-agent system operates autonomously for common failure modes that have well-established remediation procedures. For high-impact actions, the architecture includes configurable approval gates. Operators can configure the system to require human confirmation before executing actions that affect multiple base stations, modify network configurations, or trigger firmware changes across device groups. All automated actions are logged with full evidence chains, enabling post-incident review and continuous policy refinement.
This graduated autonomy model recognizes that telecom operators maintain safety-critical responsibilities. Routine issue resolution (single-link restarts, individual device resets) proceeds automatically. Actions with a broader blast radius (multi-device firmware rollbacks, network-wide configuration changes) pause for operator approval through the dashboard interface. Operations teams can adjust these thresholds as they gain confidence in the system's recommendations over time.
Infrastructure Foundation
The agent workflows, inference throughput, and rapid remediation cycles described above place specific demands on the underlying hardware. The platform requires sustained GPU memory bandwidth for large-model inference, sufficient accelerator memory to host 235 billion parameters without multi-server distribution, and enough compute headroom for concurrent agent workloads alongside continuous log ingestion. The Dell PowerEdge XE9785L server equipped with AMD Instinct MI355X accelerators meets these requirements within a single-server footprint.
| Component | Specification |
|---|---|
| Server Platform | Dell PowerEdge XE9785L Server |
| Form Factor | 8U Rack Server |
| GPU Accelerators | 8x AMD Instinct MI355X Accelerators |
| GPU Memory | 2.3 TB aggregate HBM3e (288 GB per accelerator) |
| CPU | AMD EPYC Processor |
| Operating System | Ubuntu 22.04.5 LTS |
Table 4 | Dell PowerEdge XE9785L Hardware Configuration
The AMD Instinct MI355X accelerator provides 288 GB of HBM3e memory per accelerator, a 50 percent increase over the 192 GB available on the previous-generation MI300X. This expanded capacity directly enables efficient deployment of the Qwen3-235B reasoning model. Two MI355X accelerators can host the entire 235-billion-parameter model without aggressive quantization or distribution across multiple servers, either of which would introduce latency and operational complexity.
With 2.3 TB of aggregate GPU memory across eight accelerators, the XE9785L hosts both the reasoning model and embedding model simultaneously while providing capacity for concurrent agent workloads. Memory-intensive operations, including serving multiple model instances, fine-tuning, and running concurrent reasoning agents, are all feasible on a single server. This consolidation simplifies procurement, reduces data center footprint, and eliminates inter-server communication latency that would degrade real-time incident response performance.
In the benchmarked configuration, two MI355X accelerators host the Qwen3-235B reasoning model in tensor-parallel mode. Two additional accelerators serve embedding model replicas to maintain retrieval throughput during peak event ingestion. The remaining four accelerators provide operational headroom: memory capacity for expanded inference context windows under concurrent agent workloads, queuing buffers during burst event periods, and reserved capacity for model updates or secondary model evaluation without interrupting active monitoring. This allocation ensures that the 30-BBU workload operates well within the server's resource envelope rather than at its ceiling, a design margin that production telecom deployments require for sustained reliability.
The AMD EPYC processor handles CPU-bound preprocessing: event streaming through the Vector Data Pipeline, log transformation, and database operations. During peak ingestion periods when network events generate high volumes of telemetry data, the high-core-count processor prevents CPU bottlenecks from limiting pipeline throughput.
ROCm and the vLLM Inference Pipeline
The inference performance demonstrated in this paper depends on the tight integration between AMD ROCm and the vLLM serving framework. ROCm 7.0 provides the GPU kernel libraries, memory management primitives, and inter-GPU communication layers that vLLM uses to implement continuous batching and PagedAttention. For the Qwen3-235B model deployed across two MI355X accelerators, ROCm manages tensor parallel inference with minimal inter-GPU communication overhead, delivering the 7,773 tokens-per-second throughput measured at peak load.
ROCm supports the Hugging Face model format natively, so new models published to the Hugging Face Hub (and curated through Dell Enterprise Hub) deploy on MI355X accelerators without format conversion or custom compilation. When the next generation of reasoning models becomes available, operators can evaluate them on existing hardware by updating the model weights in vLLM, with ROCm handling the low-level GPU resource management automatically. This upgrade path protects the hardware investment and ensures the monitoring solution can evolve as AI model capabilities advance.
Operational Ecosystem: From Deployment to Production
The Dell PowerEdge XE9785L server's value extends beyond raw compute performance. Three ecosystem capabilities elevate the server from a standalone inference platform into a managed, production-grade AI infrastructure component.
Dell Enterprise Hub, integrated with the Hugging Face model repository, provides a curated path from model selection to validated deployment. Operations teams select models from a catalog pre-validated against the XE9785L hardware configuration, covering MI355X memory capacity, ROCm version compatibility, and vLLM serving parameters. The platform generates deployment configurations and tracks model versions across the fleet, ensuring uniformity across multi-site telecom deployments and providing the change management audit trail that regulatory compliance requires.
Dell iDRAC delivers out-of-band server management that operates independently of the AI application stack. iDRAC continuously monitors GPU temperatures, power supply health, storage integrity, and fan performance, issuing proactive alerts before hardware issues impact monitoring capability. iDRAC's Redfish API enables integration with existing ITSM platforms, providing a unified view of both the telecom network being monitored and the AI infrastructure performing the monitoring. Its remote console and firmware management capabilities reduce the need for on-site technician visits, which is particularly valuable for deployments at edge locations or central offices with limited physical access.
AMD ROCm's open-source platform ensures that the entire inference stack remains auditable, portable, and free from proprietary lock-in. Models and pipelines built on standard frameworks run on ROCm without requiring framework-level modifications. Telecom security teams can inspect the GPU runtime codebase as part of their infrastructure certification process. When next-generation AMD Instinct accelerators become available, existing model deployments and serving configurations can be migrated forward without application-level changes, protecting the operator's multi-year infrastructure investment.
Approach Comparison
| Capability | Manual NOC Operations | Cloud-Based AIOps | Multi-Agent on Dell PowerEdge |
|---|---|---|---|
| Detection Latency | Batch cycle (minutes to hours) | Near real-time (cloud dependent) | Continuous monitoring, sub-minute detection |
| Root Cause Analysis | Manual log correlation | AI-assisted, requires data upload | Autonomous, RAG-powered |
| Remediation | Manual execution | Recommendation with manual execution | Automated with audit trail |
| Data Sovereignty | On-premises | Data leaves perimeter | On-premises, fully controlled |
| Scalability | Linear staff increase | Cloud-elastic, variable cost | Single-server, deterministic cost |
Table 5 | Monitoring Approach Comparison
Note: This comparison reflects architectural capabilities. The multi-agent solution column represents tested performance under simulated workloads at scales up to 30 BBUs / 90 RRHs. Cloud-based AIOps capabilities vary by vendor and deployment model.
Performance Benchmarking
Performance validation demonstrates that the solution scales effectively across increasing network complexity while maintaining the throughput required for real-time incident detection and remediation. Testing measured inference throughput, event processing capacity, and GPU resource utilization under production-representative workloads.
Test Configuration
The team conducted benchmarks on a Dell PowerEdge XE9785L server equipped with eight AMD Instinct MI355X accelerators. The solution deployed the Qwen3-235B-A22B-Thinking model in FP8 precision using vLLM v0.10.1 optimized for AMD ROCm 7.0. The reasoning model ran in tensor-parallel mode across two MI355X accelerators. The remaining six accelerators hosted the bge-large-en embedding model replicas and provided memory headroom for concurrent agent context windows, vector retrieval operations, and inference request queuing.
Testing targeted representative configurations for small (3 to 6 BBUs), medium (15 BBUs), and large (30 BBUs) deployments. Each configuration level ran under sustained load to capture steady-state performance, throughput and resource utilization characteristics.
Scalability Results
Event processing capacity scales near-linearly as monitoring scope increases. At maximum tested configuration of 30 BBUs and 90 RRHs, the system sustained a throughput of 2,253 events per minute while generating 7,773 tokens per second of inference throughput.
| BBUs Monitored | RRHs Monitored | Events Processed/min | Throughput (tokens/sec) |
|---|---|---|---|
| 3 | 9 | 539 | 2,231 |
| 15 | 45 | 1,515 | 6,823 |
| 30 | 90 | 2,253 | 7,773 |
Table 6 | Scalability Performance on Dell PowerEdge XE9785L with AMD Instinct MI355X
At the largest tested configuration (30 BBUs / 90 RRHs), the system processes over 2,200 events per minute and completes typical workflows in 2 to 8 minutes, compressing hours of manual troubleshooting into minutes of automated response. To put these numbers in operational context, the system resolves most incidents within minutes of detection, with average workflow durations of 102 to 463 seconds depending on complexity. Against the industry-estimated $100,000-per-hour cost of unplanned downtime cited earlier, even modest reductions in mean time to resolution translate directly into avoided revenue loss and reduced customer churn.
Generational Comparison: MI355X vs. MI300X
To quantify the generational improvement, the same Qwen3-235B-A22B-Thinking model was benchmarked on both MI355X and MI300X accelerators using Dell PowerEdge XE9785L and XE9680 servers respectively.

Figure 5 | Events/Minute Scalability

Figure 6 | Tokens/Second Throughput
At the maximum tested configuration of 30 BBUs, the MI355X achieves 40 percent higher inference throughput (7,773 vs. 5,476 tokens per second) and 50 percent greater event processing capacity (2,253 vs. 1,470 events per minute) compared to the MI300X. This performance gap widens at mid-range configurations: at 15 BBUs, the MI355X processes 57 percent more inference tokens per second (6,823 vs. 4,342) and nearly double the events per minute (1,515 vs. 766).
This generational improvement is attributable to architectural enhancements, most notably the MI355X's expanded 288 GB HBM3e per accelerator (versus 192 GB on the MI300X), which reduces the memory management overhead that constrains inference speed at scale. With more memory per GPU, tensor parallel inference across two accelerators operates with lower resource contention, enabling higher sustained throughput under concurrent agent workloads. Additional factors contributing to the throughput gain include HBM3e bandwidth improvements and CDNA 4 compute architecture enhancements. Isolating the precise contribution of each factor requires controlled single-variable testing beyond the scope of this benchmark. For telecom operators evaluating infrastructure investments, these gains translate into expanded monitoring coverage and faster incident response within the same single-server footprint.
Latency Profile
End-to-end latency measurements confirm that the solution satisfies real-time operational requirements. Simple issue classification (such as confirming an INFO-level log requires no action) completes in approximately 45 seconds. Complex multi-component root cause analysis with automated remediation requires up to 463 seconds. The following metrics capture the range across all tested scenarios:
- Average agent workflow duration: 102 to 463 seconds, depending on complexity
- Minimum workflow completion: 45 seconds for straightforward issue classification
- Time to first token (TTFT): 200 to 245ms for inference requests (log ingestion to model response, not the full NOC workflow)
- P95 end-to-end workflow latency (covering log ingestion, reasoning, and remediation): 43 seconds at low concurrency (3 BBUs) to 390 seconds at maximum tested load (30 BBUs), varying by query complexity and concurrent agent activity
These latency figures align with the rapid remediation demonstrated in single-incident scenarios. For straightforward failures, the system compresses a multi-hour manual outage into a resolution completed in under two minutes.
Limitations and Considerations
This architecture is validated for deployments of up to 30 BBUs and 90 RRHs per server. These benchmarking results reflect controlled test scenarios using simulated telecom log data. Production deployments may experience different throughput characteristics depending on log volume, event complexity, and the number of concurrent agent workflows.
The current implementation supports English-language log formats. Networks generating logs in other languages or non-standard formats may require additional parsing configuration.
Automated remediation actions demonstrated here (firmware rollback, link restart) represent common, well-understood failure modes. Complex multi-vendor interoperability issues may still require human escalation. The configurable approval gates described in the Safety and Human Oversight section give operators control over which actions proceed autonomously.
Finally, the 30-BBU configuration represents a large single-site deployment. At that scale, the inference engine queued 257 requests at peak, indicating that operators approaching this capacity should evaluate additional accelerator resources or model optimization strategies. Operators with substantially larger networks should plan for multi-server scaling.
Conclusion
Telecom operators face a fundamental choice: continue scaling manual processes that cannot keep pace with network complexity, or deploy autonomous systems that detect and resolve incidents faster than human teams can respond. The benchmarks and architecture presented in this paper demonstrate that the second path is now technically viable, with performance validated at scales representative of single-site C-RAN deployments using simulated operational data.
A coordinated team of specialized AI agents transforms network operations from reactive troubleshooting into continuous, proactive infrastructure management. These agents monitor around the clock for link failures, synchronization issues, hardware malfunctions, and performance degradation, all without human initiation. When incidents occur, the Qwen3-235B reasoning model correlates current events with historical patterns retrieved from the vector database, delivering root cause diagnoses in seconds for well-characterized failure modes. Common failure modes resolve in under two minutes for isolated incidents, reducing mean time to resolution from hours to minutes.
Beyond incident response, the platform provides unified visibility through a single command center dashboard: geographic network visualization, live QoS metrics, event streams, and agent workflow status across all distributed base station infrastructure. Every automated action generates a complete audit trail, supporting compliance requirements and enabling continuous improvement through post-incident review. Dell iDRAC ensures the monitoring platform itself maintains the uptime that telecom operations demand, with out-of-band health management, proactive alerting, and remote administration.
The Dell PowerEdge XE9785L server with AMD Instinct MI355X accelerators provides the memory and compute density to run these workloads entirely on premises. Operators can deploy a frontier-scale reasoning model alongside embedding models and concurrent agent workflows on a single server, with no logs or telemetry leaving the network perimeter. This on-premises architecture eliminates cloud dependencies and external API calls that would introduce latency and data sovereignty concerns.
As network traffic grows and 5G deployments expand, the gap between manual monitoring capabilities and operational demands continues to widen. Operators that invest in autonomous monitoring infrastructure today position themselves for higher service quality, lower operational costs, and faster response to network events before that gap widens further.
To learn more about implementing this solution, contact Dell Technologies or request access to reference code at contact@metrum.ai.
Addendum: Key Concepts for IT Decision Makers
What is RAG, and why is it critical for enterprises?
Retrieval-Augmented Generation (RAG) is a method in natural language processing that enhances the generation of responses by incorporating external knowledge retrieved from a large corpus or database. This approach combines the strengths of retrieval-based models and generative models to deliver more accurate, informative, and contextually relevant outputs.
The key advantage of RAG is its ability to dynamically leverage external knowledge, allowing the model to generate responses informed not only by its training data but also by up-to-date and detailed information from the retrieval phase. This makes RAG particularly valuable in applications where factual accuracy and comprehensive details are essential, such as in network operations, incident management, and other fields that require precise information. RAG gives enterprises a practical mechanism for improving the accuracy, relevance, and efficiency of their information systems.
Why is Dell PowerEdge XE9785L with AMD Instinct MI355X well-suited for RAG solutions?
The Dell PowerEdge XE9785L server supports high-density GPU acceleration (up to eight MI355X accelerators) within a high-performance system architecture, making it well-suited for AI workloads that involve training, fine-tuning, and inference with large language models.
Effectively implementing RAG solutions requires robust hardware infrastructure that can handle both the retrieval and generation components. Key hardware features for RAG solutions include high-performance accelerator units and large memory and storage capacity. With 288 GB of HBM3e memory per GPU, a single AMD Instinct MI355X accelerator can host very large LLMs and their associated working memory. Optimized for generative AI, the MI355X accelerator delivers leadership AI/HPC performance and provides the memory bandwidth and compute density needed to drive high-throughput inference and generation in RAG pipelines.
What are Multi-Specialist Agents and Multi-Agent Frameworks?
Multi-Specialist Agents are domain-focused AI agents designed with specialized expertise to address distinct aspects of complex operational workflows. Each agent operates autonomously within its area of specialization, such as network diagnostics, hardware health, communication link analysis, or report generation, while coordinating with other agents to achieve a shared operational goal. These agents use reasoning models, contextual data retrieval, and adaptive decision-making to analyze issues, execute corrective actions, and generate insights in real time.
A Multi-Agent Framework refers to a coordinated system where multiple specialist agents collaborate dynamically to solve interrelated problems across different domains. In this framework, agents communicate, delegate tasks, and share context through structured workflows, ensuring that each task is handled by the most capable specialist. For example, in the telecom C-RAN monitoring solution, the Operations Manager Agent delegates tasks to domain-specific agents such as the NOC Analyst, Communication Link Monitor, Hardware Health Agent, and Reporting Agent.
By combining the intelligence of multiple specialized agents, the Multi-Agent Framework enables autonomous detection, analysis, and resolution of incidents across large-scale infrastructures. It ensures faster root-cause identification, reduced downtime, and comprehensive reporting through continuous collaboration and reasoning between agents. This architecture represents a key advancement toward self-governing AI systems capable of managing complex, real-time operational environments.
System Under Test
| Component | Detail |
|---|---|
| Server Platform | Dell PowerEdge XE9785L Server |
| GPU Accelerators | 8x AMD Instinct MI355X Accelerator (288 GB HBM3e each) |
| CPU | AMD EPYC Processor (high core count) |
| Operating System | Ubuntu 22.04.5 LTS |
| Hardware Optimization | AMD ROCm 7.0 |
| Inference Runtime | vLLM v0.10.1 |
| Reasoning Model | Qwen3-235B-A22B-Thinking |
| Embedding Model | bge-large-en |
| Vector Database | PgVector + PostgreSQL |
| Time-Series Database | GreptimeDB |
| Agent Framework | AutoGen (Microsoft) |
Table 8 | System Under Test Configuration
Glossary of Technical Terms
| Term | Definition |
|---|---|
| A2A | Agent-to-Agent protocol for secure, structured communication between AI agent microservices |
| AutoGen | Microsoft's open-source multi-agent orchestration framework for coordinating AI agent workflows |
| BBU | Baseband Unit; centralized equipment for baseband signal processing in C-RAN architectures |
| bge-large-en | An open-source text embedding model used for semantic search and similarity matching |
| C-RAN | Cloud Radio Access Network; architecture that centralizes baseband processing while distributing radio units |
| GreptimeDB | An open-source distributed time-series database optimized for high-frequency telemetry data |
| HBM3e | High Bandwidth Memory 3e; latest generation high-bandwidth memory for GPU accelerators |
| iDRAC | Integrated Dell Remote Access Controller; out-of-band server management platform |
| MCP | Model Context Protocol; standardized interface for context sharing across AI agents |
| NVMe | Non-Volatile Memory Express; high-speed storage interface protocol |
| OSS | Operations Support Systems; software tools for network monitoring, fault management, and performance optimization |
| PagedAttention | Memory management technique for efficient GPU memory allocation during LLM inference |
| PgVector | A vector similarity search extension for PostgreSQL databases |
| QoS | Quality of Service; performance metrics ensuring network meets service level requirements |
| RAG | Retrieval-Augmented Generation; method combining document retrieval with AI text generation |
| ROCm | Radeon Open Compute platform; AMD's open-source GPU computing software platform |
| RRH | Remote Radio Head; distributed equipment handling RF processing at cell sites |
| TTFT | Time to First Token; latency measure for the initial response from an LLM inference request |
| vLLM | Open-source high-throughput inference engine for serving large language models |
Table 9 | Glossary
References
[1] Ericsson, "Ericsson Mobility Report, November 2024," Ericsson AB, Stockholm, Sweden, Nov. 2024. [Online]. Available: https://www.ericsson.com/en/reports-and-papers/mobility-report
[2] TM Forum, "Network Performance Benchmarking Report," TM Forum, 2024. See also: Analysys Mason, "Telecoms Network Downtime: Cost and Impact Analysis," Analysys Mason Ltd., London, U.K., 2023.
Image Sources
Dell Images: Dell Technologies Inc. Dell PowerEdge XE9785L Server. Image source: Dell DAM via Dell.com
AMD Images: AMD Inc. AMD Instinct MI300X, AMD Instinct MI355X Accelerator. Image source: AMD Media Library (https://library.amd.com)
Copyright 2026 Metrum AI, Inc. All Rights Reserved. This project was commissioned by Dell Technologies. Dell, Dell PowerEdge and other trademarks are trademarks of Dell Inc. or its subsidiaries. AMD, Instinct, ROCm, EPYC and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other product names are the trademarks of their respective owners.
DISCLAIMER - Performance varies by hardware and software configurations, including testing conditions, system settings, application complexity, the quantity of data, batch sizes, software versions, libraries used, and other factors. The results of performance testing provided are intended for informational purposes only and should not be considered as a guarantee of actual performance.