Table Of Contents

| Cross-Industry Applications of CPU-Powered Agentic RAG

| Introduction

Today's manufacturing facilities operate on complex ecosystems of interconnected machinery, from CNC equipment and robotic arms to assembly lines and injection molding systems, all managed through SCADA and MES software. Despite these technological advancements, these systems struggle to deliver the speed and intelligence needed for real-time equipment management, ultimately resulting in efficiency losses and increased operational costs.

Metrum AI offers a solution to these challenges by leveraging small language models (SLMs) and agentic retrieval-augmented generation (RAG) techniques to streamline manufacturing operations. This AI agent is capable of operating semi-autonomously to complete tasks like anomaly detection in operational environments. Deployed on the latest Dell PowerEdge R7725 servers featuring 5th Gen AMD EPYC 9755 128-Core processors, this solution dramatically reduces unplanned machine downtime and extends equipment lifespan without requiring specialized AI infrastructure.

By running entirely on CPUs, Metrum AI's solution allows manufacturing organizations to leverage their existing CPU-based infrastructure, eliminating the need for costly specialized hardware. CPUs are versatile, capable of supporting mixed workloads, and widely available in most industrial environments, simplifying deployment and reducing overall costs. This innovative manufacturing operations solution demonstrates the capability of modern CPU-based systems powered by AMD EPYC processors to efficiently execute full agentic RAG workflows, with scalability across diverse production scenarios.

| Key Highlights

| Solution Overview

Metrum AI's manufacturing operations solution transforms production environments by leveraging agentic RAG, powered by cutting-edge software components including a vector database, embeddings model, and small language model, to autonomously convert raw machine sensor data into actionable insights.

This solution focuses on bottle capping operations, simulating a realistic production line scenario by ingesting simulated pressure, torque, vibration, and audio sensor data from OPC UA publishers and SCADA systems. The data is loaded into a time series database, where it is contextually enriched and vectorized by an embeddings model and again stored in a vector database. When anomalies occur, an agentic RAG framework analyzes patterns against a knowledge base of historical error data, autonomously identifying root causes of issues like torque irregularities or pressure fluctuations in capping equipment. The system then generates actionable insights delivered through an interactive dashboard and targeted notifications, enabling maintenance teams to resolve issues before production stoppages occur. Running entirely on Dell PowerEdge R7725 servers with AMD EPYC 9755 processors, this CPU-optimized solution continuously learns from operational outcomes, cutting unplanned downtime and extending equipment lifespan without the need for specialized AI infrastructure.

Figure 1. Screenshot of Manufacturing Operations Solution User Interface.

Figure 2. vLLM Model Serving Performance of Llama 3.2 3B with BF16 Precision

This graph illustrates the throughput, measured in tokens per second, as a function of the number of concurrent requests.

This solution demanded an infrastructure platform delivering exceptional performance and seamless scalability, leading to our thorough evaluation of available options. The Dell PowerEdge R7725 server equipped with 5th Gen AMD EPYC 9755 processors emerged as the optimal foundation for our solution. We validated this decision through comprehensive vLLM-based performance testing using the cutting-edge Llama 3.2 3B SLM, measuring output token throughput at various concurrency levels:

While these results revealed impressive throughput that scales with increasing concurrent requests, each facility's RAG deployment would rarely need to handle more than a few dozen concurrent users, making the 32 concurrent requests scenario a realistic representation of actual usage patterns. At 32 concurrent requests, our testing demonstrates throughput exceeding 10 tokens per second per request, which meets the industry standard for responsive interactive applications. This performance level ensures that factory personnel can interact with the RAG system in real-time, receiving immediate responses to their queries about manufacturing documentation, operational procedures, and equipment specifications.

Dell PowerEdge servers equipped with EPYC 9755 processors provide more than sufficient capacity for these deployments, ensuring that users experience no perceptible latency when interacting with the system. This hardware configuration strikes an optimal balance between performance and resource allocation for manufacturing environments, where interactive RAG capabilities directly contribute to operational efficiency and decision-making speed on the factory floor.

To support this solution, we chose the Dell PowerEdge R7725 server equipped with 5th Gen AMD EPYC 9755 128-Core processors and high-speed DDR5 6000 memory. This hardware demonstrated exceptional performance in vLLM-based tests, making it ideal for running small language models (SLMs) like Llama 3.2 3B. Additionally, this configuration excels at supporting a range of AI agents, the backbone of our manufacturing operations solution, while maintaining both speed and accuracy.

The table below shows the hardware configuration details for this solution.

Server	Dell PowerEdge R7725 Rack Server
Processor	2x AMD EPYC 9755 128-Core Processors
Memory	24 x 128GB DDR5 Memory (6000 MT/s)
Drive Bays	Dell NVMe PM1743 RI E3.S 3.84TB
Networking	BCM57504 NetXtreme-E (10Gb-200Gb) Ethernet
OS	Ubuntu 24.04.1 LTS

Figure 3. Table of Hardware Configuration Details for the Manufacturing Operations Solution.

| Solution Details and Workflow

Let's explore the core of this solution: an agentic retrieval-augmented generation (RAG)* workflow that automates critical components of manufacturing operations management process. Agentic RAG enhances traditional RAG by integrating autonomous agents capable of breaking down complex tasks, maintaining contextual awareness, and executing specific sub-tasks while collaborating effectively. This advanced approach is essential for processing data from production lines to detect issues, identify root causes, and deliver actionable insights—while incorporating human oversight to ensure precision. Below, we detail the step-by-step data flow within this system:

Figure 4: Solution Workflow.

Data Ingestion: The workflow begins with data ingestion from two critical sources: SCADA and MES via OPC-UA. The SCADA systems simulate and manage pressure readings, torque measurements, and vibration patterns, while the OPC-UA communicates MEMS audio sensor data, all of which are stored in a time series database. The Dell PowerEdge R7725 with AMD EPYC 9755 processors provides exceptional I/O throughput capabilities, enabling the simultaneous ingestion of these sensor data streams without bottlenecks, even during production peak periods.
Contextual Retrieval via Vector Database: As operational data flows into the system, the bge-small-en-v1.5 embeddings model transforms the data into semantic vector representations stored in the Milvus Vector Database. A searchable knowledge base of manufacturing expertise also enhances the solution by contextualizing current operational data against known patterns of equipment behavior and failure modes specific to bottle capping processes. The massive core count of the AMD EPYC 9755 processors efficiently handles the parallel vector transformation workloads, enabling real-time encoding of industrial data streams without requiring GPUs.
Agentic RAG Analysis: At the core of the solution, an agentic RAG workflow powered by LangGraph and Llama 3.2 3B, analyzes these multiple data streams simultaneously. When anomalies are detected in the capping line's operation, the system leverages the language model to analyze and correlate real-time sensor readings with similar historical incidents from the vector database. AI agents autonomously detect root causes of issues by comparing current pressure, torque, vibration, and sound patterns against known failure signatures. The high memory bandwidth and cache architecture of the AMD EPYC 9755 processors are particularly well-suited for hosting all critical software components of agentic RAG, enabling the solution to run with consistently low latency even when handling multiple concurrent requests.
Intelligent Operator Chat Interface: When anomalies are detected in the bottle capping line, operators can interact with the system through an intuitive chat interface that transforms complex diagnostic processes into natural conversations. Operators simply type queries like "List all anomalies in the past 10 minutes," after which the system processes these queries through the agentic RAG framework, delivering contextualized responses that combine real-time sensor data with historical patterns. The interface also showcases model reasoning, showing operators exactly how conclusions were reached. For each anomaly, the system displays its reasoning chain, highlighting the specific sensor patterns it identified, similar historical incidents it referenced, and the sensor abnormalities that led to its diagnosis.
Comprehensive Report Generation: Once root causes are identified, the system generates an issue report along with resolution recommendations delivered through the chat interface. This report provides production managers with issue detection alerts, troubleshooting recommendations, and status reports, allowing them to see the exact nature of equipment issues as they develop.

This entire workflow executes seamlessly on Dell PowerEdge R7725 servers with AMD EPYC 9755 processors, demonstrating how CPU-optimized infrastructure can support advanced AI applications in manufacturing environments without specialized hardware requirements.

| Solution Architecture

Figure 5. Solution Architecture.

The software stack incorporates the following key components to power this solution:

vLLM (v0.5.3.post1): An industry-standard library for optimized serving of open-source large language models (LLMs), featuring support for AMD ROCm 6.1.
llama-deploy: An async-first framework designed for building, iterating, and deploying multi-agent systems in production.
Llama 3.2 3B Model: A leading open-weight small language model with three billion parameters, served using vLLM with AMD ROCm optimizations for enhanced performance.
LangGraph: A widely-used open-source retrieval-augmented generation (RAG) framework.
bge-large-en Embeddings Model: A top-ranked text embeddings model accessible through Hugging Face APIs, known for its semantic accuracy.
MilvusDB: An open-source vector database offering high-performance embedding and similarity search capabilities.

| Cross-Industry Applications of CPU-Powered Agentic RAG

While bottle capping operations provided our test case, this architecture—combining SLMs, embeddings models, and vector databases on Dell PowerEdge servers with 5th Gen AMD EPYC processors—creates a versatile blueprint applicable across numerous industries:

| Summary

Metrum AI's agentic RAG solution for manufacturing operations demonstrates how AI can transform equipment monitoring and maintenance from reactive to proactive. By combining small language models with vector databases and advanced embedding techniques, manufacturers can detect issues before they cause downtime, extend equipment life, and optimize production—all without specialized hardware.

The performance results confirm that Dell PowerEdge R7725 servers with 5th Gen AMD EPYC 9755 processors provide the computational power needed for these AI workloads. This CPU-based approach makes industrial AI accessible to organizations of all sizes without the typical costs and complexity of specialized infrastructure.

As manufacturing evolves toward smarter operations, the ability to turn industrial data into actionable insights will be crucial for staying competitive. Start implementing these technologies today with Dell PowerEdge servers and AMD EPYC processors—a proven foundation for industrial AI that will help you reduce downtime, increase efficiency, and lead in manufacturing innovation.

To learn more about this solution or request access to our reference implementation, please contact us at contact@metrum.ai.

"The advantage of AI agents presents an unparalleled productivity opportunity, enabling AI to operate independently to accomplish tasks with minimal human intervention. With AI agents, we can run agentic RAG workloads or other offline, batched tasks entirely on CPUs, optimizing resource utilization and also significantly enhancing efficiency and scalability."

Chetan Gadgil, CTO at Metrum AI

| References

AMD images: AMD.com, AMD Partner Resource Library, https://www.amd.com/en/partner/resources/resource-library.html
Dell PowerEdge R7725 Rack Server [Image]. Retrieved from https://www.dell.com/en-us/shop/dell-poweredge-servers/new-poweredge-r7725-rack-server/spd/poweredge-r7725/pe_r7725_tm_vi_vp_sb

Addendum

| Appendix A: Key Concepts

This solution leverages the following technical concepts:

OPC-UA Integration
- OPC-UA is a secure, platform-independent communication standard widely adopted in manufacturing for real-time data exchange between equipment. By ingesting OPC-UA data streams, the RAG solution gains direct visibility into machine states—spindle temperatures, vibration profiles, load currents, fault codes, humidity levels in HVAC systems, and more. Additionally, MEMS (Micro-Electro-Mechanical Systems) audio sensors provide another critical data source, capturing high-frequency acoustic signatures that can reveal anomalies such as bearing wear, cavitation, or misalignment before traditional sensors detect issues. This multimodal data feeds the RAG pipeline, enabling the model to identify potential failures early and recommend corrective actions with greater accuracy.
SCADA Systems
- SCADA (Supervisory Control and Data Acquisition) systems are critical for industrial automation, providing real-time monitoring and control of equipment across manufacturing, energy, water treatment, and other sectors. These systems collect and process data from sensors, PLCs, and RTUs, enabling operators to track parameters like pressure, temperature, voltage levels, and flow rates. By integrating SCADA data, a RAG solution can enhance situational awareness, detect anomalies, and provide actionable insights. Combined with additional sensor inputs—such as MEMS audio for early fault detection—SCADA-driven intelligence helps optimize performance, reduce downtime, and improve predictive maintenance strategies.
Retrieval-Augmented Generation (RAG)
- Retrieval-Augmented Generation (RAG) combines retrieval-based and generative NLP models to produce accurate, contextually relevant outputs by incorporating external knowledge from a database or corpus. Our legislative bill analysis solution leverages RAG to dynamically retrieve and integrate relevant legal, economic, and environmental data from uploaded code documents to generate detailed, fact-based summaries and insights. This ensures the solution provides timely, accurate analysis, streamlining decision-making processes for government agencies.
AI Agents and Agentic Workflows
- AI agents are autonomous software tools that process information, make decisions, and take actions to achieve specific goals using techniques like machine learning and natural language processing. In agentic workflows, multiple AI agents collaborate, each with specialized roles, to break complex tasks into smaller steps for more accurate and efficient execution. Our legislative bill analysis solution exemplifies this approach, employing AI agents for distinct tasks like legal, economic, and environmental impact assessments. These agents iteratively refine their outputs, ensuring detailed and reliable insights that support decision-making while streamlining the analysis process for government agencies.

Copyright © 2025 Metrum AI Inc. All Rights Reserved. Metrum AI, the Metrum AI logo, and other trademarks are trademarks of Metrum AI Inc. The analysis in this document was conducted by Metrum AI Inc. and commissioned by Dell Technologies.

Dell Technologies, Dell, Dell PowerEdge, Dell logo, and other trademarks are trademarks of Dell Inc. or its subsidiaries. AMD, AMD logo, AMD EPYC, AMD ROCm, and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Other trademarks may be the property of their respective owners.

DISLAIMER: Performance varies by hardware and software configurations, including testing conditions, system settings, application complexity, the quantity of data, batch sizes, software versions, libraries used, and other factors. The results of performance testing provided are intended for informational purposes only and should not be considered as a guarantee of actual performance.

Metrum AI believes the information in this document is accurate as of its publication date. The information is subject to change without notice.

Accelerating Manufacturing Operations with Agentic RAG

| Introduction

| Solution Overview

| Solution Details and Workflow

| Solution Architecture

| Cross-Industry Applications of CPU-Powered Agentic RAG

| Summary

Addendum

| Appendix A: Key Concepts