Multi-Agent Risk Analysis and Compliance Monitoring on Dell PowerEdge™ XE9785L Servers Powered By AMD Instinct™ MI355X Accelerators

Multi-Agent Risk Analysis and Compliance Monitoring on Dell PowerEdge™ XE9785L Servers Powered By AMD Instinct™ MI355X Accelerators.

| February 2026

Executive Summary

Financial institutions spend over $206 billion annually^[1] on compliance, yet risk teams still operate with multi-hour blind spots during market events, fragmented data pipelines, and limited capacity to process the 80 percent^[2] of institutional data that remains unstructured. Legacy batch systems and cloud-first AI tools cannot simultaneously address real-time market velocity and on-premises data sovereignty requirements. The Institutional Portfolio Risk Agents (IPRA) platform closes this gap: a purpose-built, on-premises AI solution running on a Dell PowerEdge™ XE9785L server equipped with eight AMD Instinct™ MI355X accelerators. This single-node architecture consolidates compute, memory, and storage into one deployment, eliminating multi-server networking and cross-node model sharding to reduce both capital outlay and operational complexity.

IPRA continuously ingests filings, news, macroeconomic indicators, and regulatory updates, then links each signal to portfolio positions and compliance mandates through a coordinated suite of AI agents. The MI355X accelerator represents a generational leap over the previous-generation MI300X, providing institutions with a clear infrastructure upgrade path: the ability to run more concurrent workloads, deploy larger reasoning models, and add inference replicas without expanding beyond a single server.

Key Results at a Glance

408 holdings analyzed/min

Linear scaling from 10 to 210 concurrent portfolios

33 compliance checks/min

20 to 30 rules evaluated per holding, full audit trail

Up to 1,635 output tokens/sec

Sustained reasoning-model generation rate across 300 to 850 concurrent portfolios

Up to 4.3x gen-on-gen throughput

MI355X vs MI300X on 128/128 workload at 8,192 concurrent requests

2.3 TB combined GPU memory

Single 8-GPU node, no multi-node sharding required

64x compliance concurrency within SLA

MI355X holds sub-100ms TPOT P95 at 8,192 concurrent sessions vs 128 on MI300X

Sub-2s median TTFT

At all concurrent request levels tested

One server, 210+ concurrent portfolios

Linear scaling on a single Dell PowerEdge XE9785L server

Table Of Contents

Solution Architecture

Software Stack

AI Model Deployment

Accuracy and Guardrails

Infrastructure Foundation

How IPRA Compares

Performance Benchmarking

Methodology

Results

Cloud Infrastructure Cost Analysis

MI355X 8-GPU Node Pricing

Cost per Portfolio

Scaling Considerations

Generational Performance Gains: MI355X vs MI300X

Peak Throughput: Scaling Under Concurrent Load

Real-Time Latency: Meeting the 100ms SLA Under Load

GPU Efficiency: Tokens per Watt at Scale

Compliance and Risk Analysis: The 2,048/128 Workload

Operational Capacity: Daily Document Volume

Conclusion

Addendum

References

The Five Operational Challenges

Five operational challenges consistently surface in conversations with risk and compliance leaders at financial institutions. Table 1 summarizes each challenge, its current state and its quantified business impact, establishing the baseline for improvement by any effective risk intelligence platform.

Challenge	Current State	Business Impact
Delayed Risk Awareness	Batch systems and manual reviews typically operate on multi-hour cycles	Critical exposure changes may go undetected during fast-moving market events
Unstructured Data Overload	~80% of relevant data resides in filings, transcripts, and news	Analysts struggle to process document volumes at market speed
Compliance Gaps	Mandate checks typically occur only after risk analysis completes	Breaches often surface late, increasing regulatory and audit exposure
Siloed Insights	Market, fundamental, macroeconomic, and regulatory signals often remain disconnected	Correlated risks across portfolios may remain hidden
Data Sovereignty Constraints	Sensitive trading and client data cannot leave the institutional perimeter	Cloud-first AI tools are often constrained by regulatory requirements

Table 1 Operational Challenges and Business Impact

Addressing these challenges requires a platform purpose-built for continuous, on-premises intelligence. The following sections explain how IPRA unifies fragmented workflows into a continuous, real-time risk and compliance capability on the Dell PowerEdge™ XE9785L server with eight AMD Instinct™ MI355X accelerators.

Solution Overview

IPRA's architecture centers on a continuously updated knowledge graph implemented in Neo4j that captures relationships between market events, issuers, sectors, portfolio positions, and compliance mandates. When new information arrives, including a 10-Q filing, breaking news headline, or regulatory notice, the system uses large language and vision models to extract entities and relationships, then maps them to the graph structure using Graphiti. Specialized agents query this shared knowledge graph to determine how incoming events affect portfolio holdings, enabling the platform to trace the impact path from a single news item to all affected positions and mandates.

Agent	Function	Operation Mode
Portfolio Impact Agent	Calculates exposure changes when new signals affect portfolio holdings	Continuous
Compliance Monitor Agent	Checks positions against institutional mandates and regulatory rules	Continuous
Regulatory Audit Agent	Maintains audit trails with traceable source links for supervisory review	Continuous
Scenario Stress Agent	Runs stress tests using natural language queries	User-initiated

Table 2 IPRA Agent Functions

Solution Flow

Figure 1 Solution Flow

The platform operates through five integrated stages, each running continuously within the institution’s secure perimeter. In the first stage, IPRA ingests data from multiple sources: SEC filings, news feeds, fundamental financial data, insider trading disclosures, macroeconomic indicators, and regulatory notices. A Kafka-based event streamer normalizes these inputs and routes them to the processing layer, where GPU-accelerated language models extract structured signals from unstructured content. Vision language models handle documents containing charts, tables, and mixed media. Text models perform summarization, sentiment scoring, and entity extraction.

The extracted signals flow into the knowledge graph, where Graphiti maps relationships between entities such as identifying which issuer published which filing, which sectors face exposure to which regulatory changes, and which portfolio positions connect to which risk factors. This contextualized graph becomes the foundation for the Portfolio Risk Intelligence Layer, where agents continuously evaluate how new information affects holdings, mandates, and compliance status. The Scenario and Impact Evaluation Layer then applies predictive modeling and stress testing to quantify forward-looking risk under varying market conditions.

The final stage delivers decision-ready intelligence to risk officers through a unified dashboard. Users see a real-time risk heatmap organized by issuer and risk category, a "What Changed?" panel that highlights the most significant events and their quantified portfolio impact, compliance status indicators with pass/warn/fail flags, and narrative briefings that explain findings in plain language. Every recommendation links back to its source data, providing the required audit trail for supervisory and regulatory review.

Figure 2 Example Unified Dashboard

Example Scenarios

To illustrate how these stages operate in practice, consider three scenarios based on real market events.

In a regulatory change scenario, federal regulators publish a final rule recalibrating the enhanced supplementary leverage ratio (eSLR) for systemically important banks. Within minutes of the rule’s publication in the Federal Register publication, IPRA's ingestion layer captures the notice, extracts the specific constraint changes, and maps affected entities to portfolio holdings. The Compliance Monitor Agent automatically generates a side-by-side comparison of prior and updated capital requirements and flags any Global Systemically Important Bank (GSIB) positions that may require rebalancing. Risk officers receive a concise summary of exposure impacts within minutes of publication, often before markets have fully priced in the change.

In a governance shock scenario, a major agricultural company delays its earnings release and announces an accounting investigation, triggering a 24 percent decline in share price at market open. IPRA detects the escalation pattern from the company's 8-K filing and associated news coverage, adjusts the issuer's governance risk score, and flags any portfolio mandates tied to disclosure quality or ESG governance minimums. The Regulatory Audit Agent logs the complete evidence chain for compliance review. The dashboard displays updated risk scores alongside an events timeline, enabling risk officers to trace the full sequence from initial disclosure to portfolio impact.

In a liquidity crisis scenario, a macroeconomic shock triggers a rapid flight to liquidity, commonly referred to as a "dash for cash", where correlations spike across asset classes and even traditional safe havens decline. IPRA’s Scenario Stress Agent recognizes the correlation breakdown. Upon the user’s acknowledgment, it calculates portfolio-wide NAV drawdown estimates and recommends emergency liquidity protocol activation. The dashboard then displays a portfolio-wide drawdown estimate alongside a correlation matrix illustrating the breakdown of traditional diversification assumptions. Risk officers receive prioritized recommendations for liquidity buffer activation, with each recommendation linked to the source signals and model outputs that informed it.

Solution Architecture

The architectural decisions behind IPRA reflect a fundamental requirement: every component must operate within the institution's secure perimeter while delivering the high-performance computational throughput needed for real-time risk intelligence. The platform combines optimized inference runtimes and a modular software stack designed for continuous operation under regulatory scrutiny.

Figure 3 Solution Architecture

Software Stack

The software architecture layers optimized runtimes atop AMD's Radeon Open Compute™ (ROCm™) 7.x platform. ROCm combines a hardware-optimized foundation with an open ecosystem approach, enabling institutions to deploy models from any source without vendor lock-in. The platform supports standard frameworks and tools, allowing risk teams to incorporate new models as they become available without rewriting application code. vLLM provides the high-throughput inference runtime, delivering token generation with continuous batching and PagedAttention memory management. This combination enables IPRA to serve multiple concurrent analysis requests while maintaining predictable latency for time-sensitive compliance checks.

Layer	Component	Function
Hardware Optimization	AMD ROCm 7.x	GPU compute and memory management
Inference Runtime	vLLM	High-throughput model serving with continuous batching
Event Streaming	Apache Kafka	Real-time data ingestion and routing
Document Processing	Docling	PDF and document parsing with vision-language models (VLM) support
Knowledge Graph	Graphiti + Neo4j	Entity relationship mapping and graph storage
Agent Framework	Microsoft Agent Framework	Multi-agent orchestration and coordination
Relational Database	PostgreSQL	Portfolio data, compliance rules, and audit logs
Authentication	Valkey	Session management and access control
Observability	Prometheus	System metrics and performance monitoring

Table 3 Software Stack Components

AI Model Deployment

IPRA deploys four specialized models across the GPU array, each optimized for specific tasks within the risk intelligence pipeline. This multi-model approach matches computational requirements to task complexity: lightweight summarization runs on smaller models with multiple replicas for throughput, while document understanding and regulatory reasoning leverage large-scale vision-language and reasoning architectures. Additionally, using Dell Enterprise Hub’s integration with Hugging Face simplifies model deployment by providing pre-validated, enterprise-ready model containers. Infrastructure teams can deploy new models through a streamlined workflow rather than building custom inference pipelines from scratch.

Model	Function
Qwen3-VL-235B-A22B-Thinking-FP8	Vision-language processing for documents containing charts, tables, and mixed media
Qwen3-235B-A22B-Thinking-2507-ptpc	Multi-step reasoning for compliance assessment and complex risk analysis
GPT-OSS-120B^[3]	Entity extraction and relationship mapping for knowledge graph construction
Magistral-Small-2507	Sentiment analysis and summarization for news and filing content

Table 4 | Model Deployment Configuration

The Magistral-Small-2507 model runs with four replicas to handle high-volume summarization and sentiment scoring workloads. GPT-OSS-120B similarly runs multiple replicas for entity extraction tasks. The 235-billion-parameter reasoning and vision-language models each require substantial GPU memory but deliver the analytical depth needed for regulatory compliance decisions.

Data Pipeline Architecture

The data pipeline transforms raw market signals into structured risk insights through four processing stages. Kafka receives streaming data from external sources including SEC EDGAR, news APIs, macroeconomic feeds, and regulatory notice systems. The event streamer normalizes incoming data formats and routes messages to appropriate processing queues based on content type and priority.

Docling handles document parsing, using vision-language models to extract structured content from PDFs containing charts, tables, and mixed layouts. Unlike traditional optical character recognition approaches, the VLM-based pipeline captures document semantics, enabling more accurate extraction of financial data even from complex multi-column filings. The CPU handles initial document preprocessing, while the GPU executes the vision-language inference.

Graphiti constructs the knowledge graph by identifying entities, relationships, and temporal connections within processed documents. The resulting graph structure, stored in Neo4j, captures how issuers connect to sectors, how regulatory rules apply to asset classes, and how portfolio positions link to risk factors. This contextualized representation enables agents to trace the impact of any market event across the full scope of portfolio holdings and compliance mandates.

Agent Orchestration

The Microsoft Agent Framework coordinates the specialized agents that execute IPRA's analytical workflows. An intelligent orchestration engine routes incoming requests and events to the appropriate agent based on task type and current system load. The Portfolio Impact Agent and Compliance Monitor Agent operate continuously, processing each incoming signal against relevant portfolio positions and mandates. The Regulatory Audit Agent maintains audit trails, while the Scenario Stress Agent responds to user-initiated requests for stress testing and sensitivity analysis.

Each agent accesses shared resources through well-defined interfaces: the knowledge graph for contextual relationships, PostgreSQL for portfolio and compliance data, and the model inference endpoints for AI-powered analysis. This separation of concerns enables independent scaling and updates to individual components without disrupting overall system operation.

Accuracy and Guardrails

IPRA mitigates hallucination risk through a multi-layered approach. Every LLM-generated assessment is grounded in source data retrieved from the knowledge graph and PostgreSQL. The Compliance Monitor Agent applies deterministic threshold checks before invoking LLM-based qualitative assessments, ensuring that quantitative compliance limits are enforced. The Regulatory Audit Agent logs the complete evidence chain for each finding, enabling human reviewers to verify any AI-generated recommendation against its source material.

Security and Observability

Valkey, a Redis-compatible key-value store, manages authentication tokens and session state. All user interactions require authenticated sessions, and the system logs every query, analysis result, and recommendation for regulatory audit purposes. Prometheus collects system metrics from all components, enabling operations teams to monitor GPU utilization, inference latency, and pipeline throughput in real time.

Dell’s Integrated Dell Remote Access Controller (iDRAC) provides out-of-band management for the underlying server infrastructure. Operations teams can monitor hardware health, thermal status, and power consumption independent of the operating system. iDRAC's secure remote management capabilities enable infrastructure teams to perform firmware updates, diagnose issues, and restore service without physical access to the data center, a critical requirement for institutions with distributed operations or limited on-site staff.

The entire architecture operates within the institution's network perimeter. No data leaves the secure environment, and all model inference executes locally on the dedicated hardware. This sovereign deployment model satisfies regulatory requirements that prohibit transmission of sensitive portfolio and client data to external cloud services.

Infrastructure Foundation

The right infrastructure for sovereign AI must satisfy three requirements simultaneously: run large-scale models without aggressive compression, keep all data within a single physical boundary for simplified audit controls, and sustain peak throughput during extended market stress events. The Dell PowerEdge XE9785L server with AMD Instinct MI355X accelerator was chosen to address all three requirements on a single server.

The software architecture described above demands infrastructure capable of sustaining high-throughput inference, rapid document ingestion, and continuous compliance monitoring. IPRA deploys on Dell PowerEdge XE9785L servers equipped with AMD Instinct MI355X accelerator, delivering the compute density and memory capacity required for sovereign AI deployment in regulated environments.

Component	Specification
Form Factor	5U Rack Server with Direct Liquid Cooling
CPU	AMD EPYC™ 9965 (192 cores, 384 threads)
System Memory	2.95 TB DDR5
GPU Accelerators	8x AMD Instinct MI355X Accelerators
GPU Memory	2.3 TB aggregate HBM3e (288 GB per accelerator)

Table 5 | Dell PowerEdge XE9785L Hardware Configuration

The AMD Instinct MI355X accelerator provides 288 GB of HBM3e memory per accelerator, a 50 percent increase over the 192 GB available on the previous-generation MI300X. This expanded capacity gives institutions a strategic choice: deploy larger, more capable models for deeper analysis, or run more replicas of production models for higher throughput. IPRA leverages both options simultaneously. The 235-billion-parameter reasoning and vision-language models consume significant memory but deliver the analytical depth required for complex regulatory decisions. Meanwhile, multiple Magistral-Small replicas handle high-volume summarization tasks in parallel. On the MI300X platform, this combination would require aggressive quantization or distribution across multiple servers, adding latency and operational complexity that regulated institutions seek to avoid.

With 2.3 TB of aggregate GPU memory, the XE9785L hosts all four production models simultaneously: Qwen3-VL-235B for document understanding, Qwen3-235B for compliance reasoning, GPT-OSS-120B for entity extraction, and multiple Magistral-Small-2507 replicas for summarization. This consolidation onto a single server streamlines procurement, reduces data center footprint, and eliminates inter-server communication latency. All data remains within one physical boundary, reducing the attack surface and simplifying the access controls that compliance teams must audit.

The AMD EPYC 9965 processor handles CPU-bound preprocessing tasks, including Kafka event streaming, Docling document parsing, and knowledge graph updates. During peak ingestion periods, when regulatory agencies publish multiple notices or earnings-season concentrated filings, the 192-core processor prevents CPU bottlenecks from constraining pipeline throughput. Direct liquid cooling maintains optimal temperatures during extended operation, enabling consistent performance during prolonged market stress events.

Figure 4 | 8x AMD Instinct MI355X Accelerator Model Map

How IPRA Compares

The following table summarizes how IPRA's architecture addresses the limitations of legacy batch systems and cloud-based AI platforms in regulated financial environments.

Capability	Legacy Batch Systems	Cloud AI Platforms	IPRA on Dell PowerEdge
Processing Latency	Multi-hour batch cycles	Near real-time, subject to network latency	Continuous, on-premises, sub-minute signal processing
Data Sovereignty	On-premises, fully controlled	Introduces data residency considerations	On-premises, all data stays within institutional boundaries
Compliance Monitoring	Periodic, post-analysis checks	Varies by provider; audit trail gaps common	Continuous, integrated with every analysis cycle
Audit Trail	Manual assembly from multiple systems	Provider-dependent, may limit traceability	Automated, source-linked, regulator-ready
Unstructured Data Processing	Limited or manual	Strong, but subject to data residency constraints	GPU-accelerated NLP on-premises with large-scale models
Scalability	Requires multi-system expansion	Elastic, but introduces sovereignty risk	Linear scaling on a single server; horizontal expansion available
Infrastructure Footprint	Multiple servers, disparate software stacks	Cloud tenancy, shared infrastructure	Single Dell PowerEdge XE9785L server with consolidated model deployment

Table 6 | Capability Comparison Across Risk Intelligence Approaches

Performance Benchmarking

Architecture and design alone do not establish production readiness. The benchmarking program validates IPRA's ability to deliver continuous risk intelligence at institutional scale on a single Dell PowerEdge XE9785L server. Testing focused on the two key throughput metrics that matter most to risk operations: how many holdings the system can analyze per minute and how many compliance checks it can execute concurrently.

Methodology

The benchmark simulates a realistic stress scenario based on the March 2023 Silicon Valley Bank collapse, a contagion event that spread rapidly across technology stocks, regional banks, and stablecoins. The benchmarking team selected this scenario because it exercises every component of the pipeline: filings trigger credit reassessment, news drives sentiment shifts, and cross-sector contagion forces the system to evaluate correlated risks across multiple portfolio positions simultaneously.

Each test portfolio contains approximately 26 holdings drawn from 10 base institutional portfolio templates spanning equities, corporate debt, commodities, and forex positions. The system scales load by replicating these portfolios and processing them concurrently. Tests sweep across concurrency levels from 10 to 210 simultaneous portfolios, with each level running for 45 minutes while metrics are sampled every 30 seconds. A drain and cooldown phase separates each run to ensure clean measurement.

A separate gen-on-gen solution benchmark compared MI355X and MI300X platforms under identical conditions. This test used the same SVB contagion scenario, the same 10 base portfolios with approximately 26 holdings each, and a concurrency sweep from 300 to 850 simultaneous portfolios. Both platforms deployed the Qwen3-235B-A22B-Thinking-2507-ptpc reasoning model with tensor parallel size 2 (two instances). Each concurrency level ran for 45 minutes with metrics sampled every 30 seconds, and a drain and cooldown phase separated each run. Extending the concurrency range well beyond the 210-portfolio ceiling of the primary benchmark enabled a direct comparison of how each GPU generation performs when the pipeline operates under sustained high load.

Two primary metrics capture IPRA's operational throughput. Holdings analyzed per minute measures the count of individual ticker positions that receive a complete four-dimensional risk assessment covering credit, market, liquidity, and regulatory risk. Each holding analysis draws on data from PostgreSQL (portfolio weights, financial fundamentals, historical prices) and the knowledge graph (scenario-specific news, regulatory filings). The system computes quantitative risk scores using Altman Z-Score, Value at Risk, Conditional VaR, GARCH volatility modeling, and the Amihud illiquidity ratio, then enriches results with knowledge-graph-sourced news sentiment. Compliance checks per minute measures the count of per-ticker, per-rule regulatory evaluations, typically 20 to 30 rules per holding. The compliance pipeline retrieves evidence from the knowledge graph, applies deterministic threshold checks and LLM-based qualitative assessments, and resolves conflicts before recording each result with full audit metadata.

Dell's Integrated Dell Remote Access Controller (iDRAC) provided out-of-band hardware monitoring throughout all test runs, enabling the benchmarking team to verify thermal stability, GPU power draw, and system health independent of the operating system during each 45-minute stress cycle.

Configuration	Details
Test Scenario	Tech & Banking Contagion (SVB Collapse)
Base Portfolios	10 institutional portfolio templates
Holdings per Portfolio	~26 assets
Concurrency Sweep	10, 30, 50, 70, 90, 130, 150, 170, 190, 210
Test Duration per Level	45 minutes (90 samples at 30-second intervals)
Reasoning Model	amd/Qwen3-235B-A22B-Thinking-2507-ptpc
Scenario Playback	Timestamped news events at 100x speed via Kafka

Table 7 | Benchmarking Configuration

Results

On the Dell PowerEdge XE9785L server with eight AMD Instinct MI355X accelerators, IPRA demonstrated consistent, near-linear throughput scaling across the full concurrency range. The table below presents average per-minute throughput at each concurrency level.

Figure 5 | Holdings Analyzed per Minute by Concurrent Portfolio Count

Figure 6 | Compliance Checks per Minute by Concurrent Portfolio Count

Note: Benchmarking concluded at 210 concurrent portfolios as throughput gains began to plateau, indicating proximity to the inference saturation point of the current single-server configuration.

The data reveals three important operational characteristics of the MI355X platform under continuous FSI workloads:

Linear throughput scaling through the operational range. Holdings throughput scales from 558 holdings/min at 300 concurrent portfolios to 1,415 holdings/min at 850 concurrent portfolios, representing a 2.54x increase as load grows 2.83x. Compliance throughput follows a similar trajectory, scaling from 53.35 to 108.90 checks/min. This near-linear scaling behavior confirms that the MI355X platform maintains consistent inference performance without degradation as concurrent workload increases.

Stable generation rate across load conditions. The Max Generation Tokens/sec column remains between 1,339 and 1,599 tokens/sec regardless of concurrency level. This stability reflects the input-dominated nature of FSI risk workloads: each holding evaluation requires extensive context ingestion (news, filings, portfolio data, compliance rules) before generating a comparatively short structured output. The generation phase is not the throughput bottleneck.

Continued headroom at 850 concurrent portfolios. The throughput curve does not show signs of saturation at the highest tested concurrency level. Holdings throughput increased by 9.8% between the 800 and 850 data points (1,288.75 to 1,415.39), suggesting the MI355X node retains additional capacity beyond the tested range. This headroom provides operational margin for production deployments that experience periodic load spikes during market events.

An important distinction exists between raw inference capacity and application-level throughput. When comparing the MI355X and MI300X platforms under the same IPRA workload configuration, application-level metrics such as holdings analyzed per minute and compliance checks per minute show only marginal differences, typically 1 to 3 percent. This result reflects the current pipeline architecture, not GPU capability. IPRA's orchestration layer dispatches a fixed batch of tasks per cycle and waits for the full batch to complete before initiating the next. Both GPU platforms therefore receive work at the same controlled rate, and the faster MI355X simply completes its assigned inference sooner. The raw inference benchmarks presented in the Generational Performance section confirm the MI355X delivers 3.3x to 4.3x more tokens per second at the model level. This gap between raw inference capacity and application throughput represents a substantial forward-looking reserve: as IPRA's dispatch rate increases through future pipeline optimizations, the MI355X will absorb significantly higher workloads within the same single-server footprint. Institutions investing in the Dell PowerEdge XE9785L with MI355X accelerators today are positioning for capacity headroom that the previous-generation MI300X cannot provide.

To contextualize these figures for portfolio management operations: at 850 concurrent portfolios with approximately 26 holdings each, the system monitors 22,100 individual asset positions simultaneously. At 1,415 holdings evaluated per minute, the system completes a full risk assessment pass across all monitored positions approximately every 15.6 minutes. For a firm managing 500 institutional portfolios, this enables multiple complete risk cycles per hour, providing continuous visibility into credit, market, liquidity, and regulatory exposures rather than the once-daily batch assessment that legacy systems deliver.

All results were achieved on a single Dell PowerEdge XE9785L server, confirming that the platform delivers institutional-scale risk intelligence without multi-server expansion.

| Cloud Infrastructure Cost Analysis

Understanding the total cost to operate IPRA is essential for infrastructure planning decisions. The following analysis provides transparent cost estimates for running the solution on cloud-hosted MI355X infrastructure across different commitment levels.

MI355X 8-GPU Node Pricing

Cloud GPU pricing for MI355X 8-GPU nodes varies based on provider, commitment term, and availability tier. The table below summarizes current market pricing for an 8-GPU MI355X node operating continuously (8,760 hours per year). Pricing reflects publicly listed rates from named providers as of February 2026.

Pricing Tier	Per-GPU/hr	8-GPU Node/hr	Annual Cost (24/7)
Reserved (48-month)	$2.29^[4]	$18.32	~$160,000
On-Demand (Lowest Listed)	$2.95^[5]	$23.60	~$207,000
Market Average (4 Providers)	$5.45^[6]	$43.60	~$382,000
Full On-Demand (OCI BM.GPU.MI355X.8)	$8.60^[7]	$68.80	~$603,000

Table 8 | MI355X Cloud GPU Pricing (8-GPU Node, Annual 24/7 Operation)

The enterprise sweet spot for most organizations with a 1-year commitment falls in the $250,000 to $400,000 per year range. These rates reflect the current cloud GPU market for MI355X accelerators and are subject to change as additional providers bring MI355X capacity online. Four providers currently offer MI355X nodes: Vultr, Oracle Cloud, TensorWave (custom pricing), and Crusoe (reserved capacity). Organizations should request current quotes from preferred providers at the time of procurement.

Cost per Portfolio

A meaningful way to evaluate infrastructure cost is on a per-portfolio basis. This allows direct comparison against existing risk platform licensing fees and translates cloud GPU economics into terms that align with how financial institutions budget for risk technology.

Calculation methodology: Per-portfolio cost divides the annual node cost by the peak tested capacity of 850 concurrent portfolios. For example, at the market average rate of $382,000 per year: $382,000 / 850 portfolios = $449 per portfolio per year, or approximately $37 per portfolio per month. This calculation assumes continuous 24/7 operation with the node running at peak portfolio capacity. Actual per-portfolio cost will vary based on the number of portfolios monitored, utilization patterns, and the pricing tier selected.

Pricing Tier	Annual Cost per Portfolio	Monthly Cost per Portfolio
Reserved (36-month)	~$188	~$16
Enterprise (1-Year)	$294 to $470	$25 to $39
Full On-Demand	~$709	~$59

Table 9 | Per-Portfolio Cost at 850 Concurrent Portfolios

These per-portfolio costs cover continuous, 24/7 risk monitoring with multi-dimensional analysis across credit, market, liquidity, and regulatory risk. Compared to traditional FSI risk analytics platforms that license per-seat or per-portfolio and typically deliver only batch-mode analysis, the GPU-accelerated approach provides a fundamentally different value proposition: always-on monitoring at a comparable or lower per-unit cost.

Scaling Considerations

Organizations that monitor more than 850 portfolios can deploy additional MI355X nodes with near-linear cost scaling. Two 8-GPU nodes would support approximately 1,700 concurrent portfolios at double the infrastructure cost. The Dell PowerEdge XE9785L's rack-dense form factor enables efficient scaling within existing data center footprints, and the AMD ROCm software stack supports multi-node orchestration without proprietary licensing overhead.

For organizations considering on-premises deployment to meet data residency requirements, the capital expenditure model changes but the per-portfolio economics remain favorable. A 3-year amortization of on-premises MI355X infrastructure typically aligns with or improves upon the reserved cloud pricing tier, with the additional benefit of eliminating recurring cloud egress and storage fees.

| Generational Performance Gains: MI355X vs. MI300X

The generational infrastructure comparison quantifies what upgrading to MI355X accelerators delivers on the Dell PowerEdge XE9785L platform: more concurrent users, faster compliance decisions, and better energy efficiency per unit of inference work.

Testing used the Qwen3-235B-A22B-Thinking-2507-ptpc reasoning model, the same model IPRA deploys for compliance assessment and risk analysis, across a sweep of input/output token lengths and concurrency levels ranging from 1 to 8,192 simultaneous requests. All runs with error rates exceeding 10 percent were excluded. GPU total power is computed as the sum of all eight individual GPU power sensor readings, representing the authoritative power metric for efficiency calculations.

The following table holds concurrency fixed at 4,096 simultaneous requests and varies the input/output token configuration to isolate how workload type affects the generational advantage. This view complements the concurrency scaling analysis above by confirming that the MI355X advantage holds across all workload types, not just short-form high-parallel configurations.

Config (In / Out)	MI355X tok/s	MI300X tok/s	MI355X GPU Power (W)	MI300X GPU Power (W)	Throughput Gain	FSI Use Cases
128 / 128	39,930	9,568	9,146	4,573	4.2x	Trade alerts, chatbots
2,048 / 128	7,438	2,295	10,260	4,737	3.2x	Compliance, KYC, AML
128 / 2,048	19,414	9,041	8,507*	5,583*	2.1x	Research notes, summaries
2,048 / 2,048	8,810	6,302	8,693*	5,695*	1.4x	Risk reports, stress tests

Table 10 | Qwen3-235B Raw Inference: MI355X vs. MI300X at 4,096 Concurrent Requests

* GPU total power values for the 128/2,048 and 2,048/2,048 configurations are measured at 1,024 concurrent requests (the highest available data point for these workloads), while throughput figures reflect 4,096 concurrent requests. This mismatch means power efficiency comparisons for these two rows are approximate. All power figures represent the sum of eight individual GPU sensor readings.

Together, the throughput scaling, latency, and efficiency data presented in the following sections validate that the MI355X delivers consistent gains across all three dimensions that govern production deployment decisions: peak capacity under concurrent load, real-time responsiveness within SLA guarantees, and energy efficiency per unit of inference work.

Figure 8 | Raw Inference Throughput Scaling: MI355X vs. MI300X

Figure 8 illustrates the concurrency scaling behavior across 1 to 4,096 simultaneous requests. While both platforms scale with increasing concurrency, the MI355X sustains materially higher token throughput, particularly in short-context, high-parallel workloads characteristic of real-time risk scoring. The divergence at higher concurrency levels reflects improved HBM3e bandwidth and compute efficiency.

Peak Throughput: Scaling Under Concurrent Load

On the Dell PowerEdge XE9785L, the most consequential difference between accelerator generations is sustained throughput under concurrent load. The MI355X continues scaling well beyond the point where the MI300X saturates. The MI300X saturates at approximately 1,024 concurrent requests on short-form workloads and delivers no additional throughput beyond that point. The MI355X continues scaling to 8,192 concurrent requests and beyond.

Token Config	MI355X Peak tok/s	At Concurrent	MI300X Peak tok/s	At Concurrent	Gain	FSI Use Case
128 / 128	43,801	8,192	10,143	1,024	4.3x	Alerts, Chatbot
128 / 2,048	19,916	8,192	10,490	1,024	1.9x	Research Notes
2,048 / 128	7,709	8,192	2,352	8,192	3.3x	Compliance, KYC
2,048 / 2,048	8,811	4,096	6,566	1,024	1.3x	Risk Reports

Table 11 | Peak Throughput by Workload: MI355X vs. MI300X

The concurrency scaling behavior on the short-form 128/128 workload illustrates this dynamic clearly. The MI300X reaches its throughput ceiling of approximately 10,143 tokens per second at 1,024 concurrent requests and delivers no meaningful increase beyond that point. The MI355X continues scaling through 2,048, 4,096, and 8,192 concurrent sessions, reaching 43,801 tokens per second at peak. In a real-time FSI environment, trade alerts, client notification engines, and fraud signals generate bursty concurrent demand. This scaling headroom determines whether the system responds in real time or queues requests during peak market hours. This extended scaling range means a single MI355X-equipped server handles workloads that would require multiple MI300X nodes, directly reducing infrastructure procurement and operational overhead.

Figure 9 | Concurrency Scaling: 128/128 Short-Form Workload

At the standard 1,024 concurrent comparison point, the MI355X already delivers meaningful throughput advantages across all workload types. On the compliance-critical 2,048/128 profile, the MI355X produces 3.3x more tokens per second at matched concurrency. On short-form alert workloads, the advantage is 1.6x. These gains at matched concurrency represent the floor of the MI355X advantage; the ceiling emerges as concurrency scales beyond the MI300X’s saturation point.

Config	MI355X tok/s @ 1,024	MI300X tok/s @ 1,024	Throughput Gain	FSI Use Case
128/128	16,097	10,143	1.6x	Trade alerts, signals
128/2,048	15,012	10,490	1.4x	Research summaries
2,048/128	7,328	2,235	3.3x	Compliance, KYC, AML
2,048/2,048	8,559	6,566	1.3x	Risk reports

Table 12 | Throughput at 1,024 Concurrent: Like-for-Like Comparison

Real-Time Latency: Meeting the 100ms SLA Under Load

For production FSI systems deployed on the Dell PowerEdge XE9785L, throughput alone does not determine deployment readiness. Per-token latency under concurrent load is the metric that separates a system that can serve a compliance dashboard from one that cannot. Two metrics capture this behavior.

TPOT (Time Per Output Token) P95 measures the 95th-percentile milliseconds between consecutive generated tokens. This is the production-grade SLA measure. If TPOT P95 remains below 100 milliseconds, 95 percent of all users at that concurrency level experience real-time-feeling responses.

TTFT (Time to First Token) P95 measures how long users wait before the response begins. This metric is dominated by input processing (prefill latency) and is critical for user-facing applications where perceived responsiveness determines adoption.

A TPOT P95 below 100 milliseconds is the standard threshold for real-time AI in financial services, covering trade alert generation, pre-trade risk checks, and live compliance screening.

Maximum Concurrent Sessions Within 100ms TPOT P95 SLA

Workload	MI300X Max Conc within 100ms	TPOT at Limit	MI355X Max Conc within 100ms	TPOT at Limit	Capacity Advantage
128/128 (Alerts)	1,024	97.8ms	2,048	83.8ms	2x more concurrent
128/2,048 (Research)	1,024	93.8ms	1,024	63.1ms	Same concurrency, 32% faster TPOT
2,048/128 (Compliance)	128	82.3ms	8,192	99.6ms	64x more concurrent
2,048/2,048 (Risk Reports)	512	99.6ms	4,096	67.6ms	8x more concurrent

Table 13 | Maximum Concurrent Sessions Within 100ms TPOT P95 SLA

TPOT P95 Detail: 2,048/128 Compliance Screening

The compliance workload (2,048 input / 128 output) presents the most decisive contrast. This token profile mirrors production compliance systems: the model reads a long transaction record, contract, or regulatory filing and produces a short classification or verdict. The MI300X breaches the 100ms TPOT threshold at just 256 concurrent sessions and exceeds the SLA by 6x at higher concurrency levels. The MI355X holds under 100ms all the way through 8,192 concurrent sessions, delivering 64x more capacity within the same latency guarantee.

Concurrent Sessions	MI300X TPOT P95	MI355X TPOT P95	Status
128	82.3ms	41.3ms	Both within SLA
256	124.4ms (BREACH)	52.7ms	MI300X breaches SLA
512	237.7ms (BREACH)	75.5ms	MI300X 2.4x over SLA
1,024	455.6ms (BREACH)	99.2ms	MI300X 4.6x over SLA
2,048	607.5ms (BREACH)	99.5ms	MI300X 6.1x over SLA
4,096	608.0ms (BREACH)	99.5ms	MI300X 6.1x over SLA
8,192	605.7ms (BREACH)	99.6ms	MI300X 6.1x over SLA

Table 14 | TPOT P95 Detail: 2,048/128 Compliance Screening Workload

The operational implication is direct. Before the trading floor has fully loaded its morning queue, the MI300X is already at its latency limit on compliance workloads. The MI355X holds the same SLA at 8,192 concurrent sessions, serving the full trading day’s compliance volume without queuing or degradation.

TPOT P95 Detail: 128/128 Trade Alerts and Short-Form

Concurrent Sessions	MI300X TPOT P95	MI355X TPOT P95	Note
512	69.6ms	47.7ms	MI355X 32% faster
1,024	97.8ms (at edge)	59.4ms	MI300X at limit; MI355X has 40ms headroom
2,048	299.9ms (BREACH)	83.8ms	MI355X still within SLA
4,096	406.7ms (BREACH)	100.2ms	MI355X at limit
8,192	416.1ms (BREACH)	208.1ms (BREACH)	Both exceed SLA at 8,192 concurrent

Table 15 | TPOT P95 Detail: 128/128 Trade Alert Workload

Time to First Token: 2,048/128 Compliance Workload

In addition to per-token latency, the MI355X delivers responses significantly faster at the prefill stage. Time to First Token determines how quickly a compliance analyst sees the system begin responding. This metric is critical for user-perceived responsiveness in interactive review interfaces where analysts process hundreds of documents per shift.

Figure 10 | TTFT P95: 2,048/128 Compliance Workload (Shorter is Better)

At 512 concurrent sessions, a compliance analyst on the MI300X waits over 17 seconds before seeing the first token of a response. On the MI355X, the same analyst receives the first token in under 4 seconds. For interactive compliance review workflows where analysts process hundreds of documents per shift, this difference compounds into hours of recovered productivity across a team.

For any FSI production system requiring both high concurrency and guaranteed per-token latency, the combination of TPOT and TTFT data, measured on a single Dell PowerEdge XE9785L, is unambiguous. The MI355X holds the 100ms TPOT SLA at 64x more concurrent sessions while also delivering first-token responses 3.6x to 4.7x faster. These characteristics determine whether a compliance system can operate as an interactive tool or only as a batch processor.

GPU Efficiency: Tokens per Watt at Scale

On the Dell PowerEdge XE9785L, GPU efficiency at low to moderate concurrency is broadly comparable between the MI355X and MI300X. The decisive difference emerges at high concurrency, where the MI300X throughput plateaus but its GPU power consumption remains elevated. The MI355X continues scaling throughput while power grows modestly, delivering 2.2x more tokens per GPU watt at 8,192 concurrent on the 128/128 workload.

Figure 11 | Tokens-Per-GPU-Watt Scaling: 128/128 Short-Form Workload

The pattern is consistent: the MI355X draws more absolute GPU power than the MI300X at every concurrency level. The efficiency advantage emerges because throughput scales faster than power consumption. At 1,024 concurrent, both platforms deliver approximately the same tokens per watt. At 2,048 concurrent and above, the MI300X’s throughput plateaus while its power draw remains largely unchanged, creating a widening efficiency gap that reaches 2.2x at peak concurrency.

At the standard 1,024 concurrent benchmark point, the efficiency picture varies by workload type. On short-form workloads (128/128) and output-heavy profiles (128/2,048, 2,048/2,048), the MI355X and MI300X deliver comparable tokens per GPU watt, with neither platform holding a decisive advantage. The compliance-critical 2,048/128 profile is the exception: the MI355X delivers 1.50x more tokens per GPU watt even at this conservative concurrency level. The MI355X efficiency advantage becomes decisive at 2,048 concurrent and above, where throughput continues scaling while the MI300X's throughput plateaus and its power draw remains elevated.

Config	MI355X tok/s	GPU W	tok/W	MI300X tok/s	GPU W	tok/W	Efficiency
128/128	16,097	8,564	1.88	10,143	5,460	1.86	~same (1.01x)
128/2,048	15,012	8,507	1.77	10,490	5,583	1.88	~same (0.94x)
2,048/128	7,328	10,266	0.714	2,235	4,701	0.475	1.50x MI355X
2,048/2,048	8,559	8,693	0.985	6,566	5,695	1.153	~same (0.85x)

Table 16 | Tokens-Per-GPU-Watt at 1,024 Concurrent: All Configurations

FSI data centers face power density constraints, rack space limits, and sustainability reporting requirements. The MI355X efficiency story centers on what happens beyond 1,024 concurrent, the inflection point where the MI300X stops delivering additional value but continues consuming similar GPU power. From 2,048 concurrent upward, the MI355X extracts 1.6x to 2.2x more inference work from every watt of GPU power. This translates to more AI capacity per rack unit with no change in power infrastructure. On the compliance workload (2,048/128), the MI355X delivers 50 percent more output per GPU watt even at the conservative 1,024 concurrent baseline, relevant for 24/7 workloads running continuously against large document volumes.

Compliance and Risk Analysis: The 2,048/128 Workload

The 2,048 input / 128 output token profile mirrors how AI operates in production FSI compliance systems on the Dell PowerEdge XE9785L: read a long transaction record, contract, or regulatory filing and produce a short classification, flag, or verdict. This workload exercises the prefill-heavy compute path that distinguishes compliance inference from general-purpose chatbot workloads. The MI355X delivers 3.3x more throughput on this profile, while the MI300X reaches a hard ceiling of approximately 2,352 tokens per second from 512 concurrent requests onward.

Figure 12 | Concurrency Scaling — Throughput (2,048/128 Compliance Workload)

Figure 13 | Concurrency Scaling — Tokens per GPU Watt (2,048/128 Compliance Workload)

The MI300X’s throughput curve on this workload is effectively flat from 512 concurrent onward, fluctuating between 2,075 and 2,352 tokens per second regardless of how many additional requests arrive. The MI355X continues scaling from 3,050 tokens per second at 128 concurrent to 7,709 at 8,192, reaching a plateau only at the highest tested concurrency levels. The throughput advantage holds steady at 3.1x to 3.3x across the production-relevant concurrency range.

Operational Capacity: Daily Document Volume

Translating the Dell PowerEdge XE9785L's peak throughput into operational terms clarifies the infrastructure planning implications. The projections below assume 2,000 tokens per document and continuous operation, representative of production compliance screening pipelines.

Figure 14 | Estimated Document Processing Capacity at Peak Throughput

Based on MI355X peak of 7,709 tok/s and MI300X peak of 2,352 tok/s on the 2,048/128 workload. Assumes 2,000 tokens per document and continuous operation.

The compliance workload advantage extends beyond raw throughput. Combining the data from this section with the latency analysis: the MI355X delivers 3.3x more compliance checks per second, produces first-token responses 4.0x to 4.7x faster, and maintains TPOT P95 under 100ms at 64x more concurrent sessions. For end-of-day regulatory batch runs, this means the system processes the full document queue without degradation. For interactive compliance review, analysts see responses begin in under 4 seconds instead of over 17. The MI300X’s hard ceiling of approximately 2,352 tokens per second means compliance teams face a capacity wall that can only be addressed by adding more nodes, each consuming approximately 4,700 to 5,700 watts of GPU power. The MI355X processes 3.3x more compliance decisions at a single node, with 50 percent better tokens per GPU watt on this workload.

Conclusion

Financial institutions face a widening gap between the speed of market events and the capacity of legacy systems to detect, analyze, and act on the risks those events create. Batch processing, fragmented data sources, and bolt-on compliance checks leave risk officers hours behind during volatile conditions. At the same time, regulatory expectations for auditability, data sovereignty, and continuous monitoring continue to intensify.

The Institutional Portfolio Risk Agent closes this gap by combining continuous data ingestion, GPU-accelerated AI analysis, and automated compliance monitoring on a single on-premises server. Purpose-built for the requirements of regulated financial environments, IPRA delivers four measurable advances over legacy approaches.

First, IPRA eliminates hours-long detection delays. The platform continuously processes SEC filings, breaking news, macroeconomic indicators, and regulatory notices, linking each signal to portfolio positions through a knowledge graph that captures entity relationships in real time. Risk officers see exposure changes as they develop, not hours after the fact.

Second, IPRA transforms compliance from a periodic checkpoint into a continuous process. The Compliance Monitor Agent evaluates every holding against institutional mandates and regulatory rules as new data arrives, maintaining a complete audit trail with source evidence, timestamps, and decision logic. Benchmarking confirmed the system can sustain 33 compliance checks per minute across 210 concurrent portfolios while preserving full audit integrity.

Third, IPRA keeps sensitive data within institutional boundaries. The entire platform, including four large-scale AI models, runs on a single Dell PowerEdge XE9785L server with 8x AMD Instinct MI355X accelerators. The 2.3 TB of aggregate GPU memory enables simultaneous deployment of 235-billion parameter reasoning and vision-language models without distributing workloads across multiple servers or relying on external cloud infrastructure. This consolidation simplifies physical security, access controls, and regulatory compliance.

Finally, generational infrastructure testing confirms that the platform is built for growth, not just for today’s workload. The MI355X accelerator delivered up to 4.3x higher peak token throughput, maintained sub-100ms per-token latency at 64x more concurrent sessions on compliance workloads, and achieved up to 2.2x better tokens-per-GPU-watt efficiency compared to the previous-generation MI300X. Under active IPRA workload, the MI355X sustained up to 1,599 tokens per second of reasoning-model output across 300 to 850 concurrent portfolios, establishing the unit economics for on-premises inference at institutional scale. On the compliance-critical 2,048/128 workload, the MI355X processes 3.3x more tokens per second while delivering compliance decisions 4.0x to 4.7x faster at the first-token level. Because the current IPRA pipeline already operates near its configured dispatch ceiling, this additional inference capacity translates into a forward-looking capacity reserve. Institutions can deploy larger models, increase dispatch rates, or add analytical agents within the same single-server footprint as requirements evolve.

The infrastructure economics reinforce the performance story. At peak capacity of 850 concurrent portfolios, the Dell PowerEdge XE9785L delivers continuous risk monitoring at $25 to $39 per portfolio per month under typical enterprise cloud commitments, comparable to or below traditional FSI risk platform licensing fees that provide only batch-mode analysis. The MI355X further improves this equation at scale: from 2,048 concurrent requests upward, the accelerator extracts 1.6x to 2.2x more inference work per GPU watt than the MI300X, delivering more AI capacity per rack unit without additional power infrastructure. Combined with single-server consolidation that eliminates multi-node networking, cross-server sharding, and duplicated management overhead, the platform provides a clear total cost of ownership advantage for institutions that require sovereign, always-on risk intelligence.

For IT directors, CTOs, and infrastructure architects evaluating sovereign AI for financial services, the Dell PowerEdge XE9785L with AMD Instinct MI355X accelerators provides a validated, single-server foundation for continuous portfolio risk intelligence, deployable today and architected for tomorrow's workloads.

Addendum

System Under Test

Type	Details
Model	Dell PowerEdge XE9785L server
No. of servers	1
CPU	AMD EPYC 9965 192-Core Processor
Memory	12-channel DDR5, 128 GB DIMMs running at 6400 MT/s, 2.95 TB total system memory, delivering ~614 GB/s theoretical threaded memory bandwidth.
Storage	3x Micron_7450_MTFDKCE3T8TFR
GPU	8x AMD Instinct MI355X Accelerators

Operating System Information

Type	Details
Operating System	Ubuntu
Version	24.04.3 LTS
Kernel	6.8.0-90-generic

Gen-on-Gen Benchmarking Configuration

Parameter	Details
Platforms Compared	8x AMD Instinct MI355X vs. 8x AMD Instinct MI300X (both on Dell PowerEdge XE9785L)
Raw Inference Model	amd/Qwen3-235B-A22B-Thinking-2507-ptpc
Raw Inference Concurrency	1 to 8,192 concurrent requests
Raw Inference Configurations	Input/Output token lengths: 128/128, 2048/128, 128/2048, 2048/2048
Solution Benchmark Scenario	Tech & Banking Contagion (SVB Collapse), identical to primary benchmark
Solution Concurrency Sweep	300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850 simultaneous portfolios
Reasoning Model Deployment	Tensor parallel size 2 (2 instances) on both MI355X and MI300X
Test Duration per Level	45 minutes (90 samples at 30-second intervals)

Key Performance Indicators (KPIs)

Metric	Description
Holdings Throughput	Measures risk analysis capacity as individual asset positions evaluated per minute.
Compliance Throughput	Measures regulatory assessment capacity as number of rule evaluations completed per minute.
Max Generation Tokens/sec	Measures peak reasoning-model token generation rate observed during active IPRA workload. Captures sustained GPU inference output while all pipeline components (data ingestion, knowledge graph updates, agent orchestration, multi-model inference) operate concurrently. Feeds directly into TCO calculations by quantifying inference capacity available per server under production conditions.

References

[1] LexisNexis Risk Solutions, True Cost of Financial Crime Compliance Study: United States and Canada (Atlanta: LexisNexis, February 21, 2024), https://risk.lexisnexis.com/about-us/press-room/press-release/20240221-true-cost-of-compliance-us-ca

[2] Ascent RegTech, "The Not So Hidden Costs of Compliance," Ascent Blog, March 27, 2025, https://www.ascentregtech.com/blog/the-not-so-hidden-costs-of-compliance/; Shashank Guda, "Unstructured Data Management in Finance," Medium, November 4, 2025, https://shashankguda.medium.com/unstructured-data-management-in-finance-86e06c190ec5.

[5] Vultr, "Cloud GPU Pricing," Vultr.com, 2026. [Online]. Available: https://www.vultr.com/pricing/#cloud-gpu. [Accessed: Feb. 27, 2026].

[6] gpus.io, "AMD Instinct MI355X GPU Price Comparison," gpus.io, 2026. [Online]. Available: https://gpus.io/gpus/mi355x. [Accessed: Feb. 27, 2026].

[7] GetDeploying.com, "AMD MI355X: Price, Specs & Cloud Providers," GetDeploying.com, 2026. [Online]. Available: https://getdeploying.com/gpus/amd-mi355x. [Accessed: Feb. 27, 2026].

[8] Oracle Cloud Infrastructure, "Compute Pricing: GPU Instances," Oracle.com, 2026. [Online]. Available: https://www.oracle.com/cloud/compute/pricing/. [Accessed: Feb. 27, 2026].

Copyright © 2026 Metrum AI, Inc. All Rights Reserved. This project was commissioned by Dell Technologies. Dell, Dell PowerEdge and other trademarks are trademarks of Dell Inc. or its subsidiaries. AMD, Instinct, ROCm, EPYC and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other product names are the trademarks of their respective owners.

***DISCLAIMER - Performance varies by hardware and software configurations, including testing conditions, system settings, application complexity, the quantity of data, batch sizes, software versions, libraries used, and other factors. The results of performance testing provided are intended for informational purposes only and should not be considered as a guarantee of actual performance.