Table Of Contents

| Introduction

Government organizations often face significant challenges in processing large volumes of legislative bills. Manual tasks such as data collection, verification, and summarization can delay critical insights, slowing down timely decision-making. Metrum AI offers a comprehensive solution to these challenges by leveraging small language models (SLMs) and agentic retrieval-augmented generation (RAG) techniques to streamline legislative bill analysis. This solution harnesses the power of AI agents—intelligent systems capable of operating semi-autonomously to complete tasks like summarization and data validation in offline, batch modes. Deployed on the latest Dell PowerEdge servers featuring 5th Gen AMD EPYC 9755 128-Core processors, this solution ensures both scalability and robust performance.

By running entirely on CPUs, Metrum AI’s solution allows organizations to leverage their existing CPU-based infrastructure, eliminating the need for costly specialized hardware. CPUs are versatile, capable of supporting mixed workloads, and widely available in most organizational environments, simplifying deployment and reducing overall costs. This innovative legislative bill analysis solution demonstrates the capability of modern CPU-based systems powered by AMD EPYC processors to efficiently execute full agentic RAG workflows, with proven scalability across diverse industries, including manufacturing, healthcare, transportation, and telecommunications.

| Key Highlights

| Solution Overview

Metrum AI’s legislative assistant simplifies legislative bill analysis by leveraging specialized AI agents to automate key tasks such as evaluating economic impact, legal compliance, and social or environmental effects. By combining AI-driven efficiency with human oversight, analysts can review findings, make adjustments, and ensure the creation of accurate and comprehensive reports.

The solution enables users to upload draft bills and submit them for processing, generating a detailed summary for each bill. Existing legislation and code documents are transformed into embeddings and stored in a vector database for retrieval. Targeted AI agents analyze the bills for economic, social, and legal impacts, as well as compliance, utilizing the retrieval-augmented generation (RAG) technique with existing legislative data. Powered by the advanced small language model, Llama 3.2 3B for text generation, the solution produces comprehensive bill analysis reports, highlighting specific sections to amend and referencing original documents for context. With a user-friendly interface featuring real-time performance monitoring and efficient report generation, Metrum AI’s legislative assistant enhances productivity, enabling teams to focus on high-impact legislative tasks.

Figure 1. User Interface of the Legislative Bill Analysis Solution.

Figure 2. vLLM Model Serving Performance of Llama 3.2 3B with BF16 Precision

This graph illustrates the throughput, measured in tokens per second, as a function of the number of concurrent requests.

This seamless workflow necessitated a hardware platform capable of delivering optimal performance and scalability, leading to a detailed evaluation of infrastructure options. The Dell PowerEdge R7725 server with 5th Gen AMD EPYC processors was ultimately selected as the hardware solution. Before finalizing this choice, we conducted extensive vLLM-based performance tests to measure throughput in tokens per second using the Llama 3.2 3B model, a cutting-edge small language model central to the solution. The results demonstrated a consistent increase in throughput as the number of concurrent requests grew.

As demonstrated, lower concurrency scenarios (e.g., 32 concurrent requests) deliver a per-request throughput exceeding 10 tokens per second—an industry benchmark for interactive applications like chatbots. In contrast, higher concurrency scenarios (e.g., 2,048 concurrent requests) achieve a per-request throughput of 1.53 tokens per second, which is well-suited for offline, batched processing tasks such as document summarization or long-form content generation. Since our solution leverages AI agents to handle documents in batch mode rather than requiring real-time responsiveness, these results underscore its ability to perform legislative analyses more efficiently than traditional, non-agentic methods. This efficiency is made possible by the exceptional memory capacity and processing power of the EPYC 9755 processors integrated with Dell PowerEdge servers.

Figure 3. Performance results for legislative bill analysis solution. It presents the total system throughput in tokens per second and the total time taken as a function of the number of concurrent documents reviewed.

With the completion of vLLM-based tests focused on model performance, we shifted our focus to evaluating the full AI agentic RAG stack within this workload. The results of solution-specific performance tests demonstrate remarkable scalability and efficiency in managing concurrent document analysis. The testing framework, designed to review 5-page legislative bill documents using a consistent prompt across varying levels of concurrency, offers a realistic measure of the system’s capabilities in high-volume legislative environments. The data highlights a clear correlation between the number of concurrent documents processed and the total system throughput in tokens per second:

This illustrates an 11.69x increase in throughput with only a 2.57x increase in processing time when comparing the analysis of 1 document to 32 documents. This highlights the efficiency of AI agents in executing batched tasks such as document summarization. The system’s ability to sustain high accuracy while significantly boosting throughput enables the rapid delivery of critical insights. This transformation allows organizations to shift from spending hundreds of hours on last-minute human expert analysis to completing automated legislative reviews within an hour of new legislation announcements.

To support this solution, we chose the Dell PowerEdge R7725 server equipped with 5th Gen AMD EPYC 9755 128-Core processors and high-speed DDR5 6000 memory. This hardware demonstrated exceptional performance in vLLM-based tests, making it ideal for running small language models (SLMs) like Llama 3.2 3B. Additionally, this configuration excels at supporting a range of AI agents, the backbone of our legislative bill analysis solution, while maintaining both speed and accuracy.

The table below shows the hardware configuration details for this solution.

Server	Dell PowerEdge R7725 Rack Server
Processor	2x AMD EPYC 9755 128-Core Processors
Memory	24 x 128GB DDR5 Memory (6000 MT/s)
Drive Bays	Dell NVMe PM1743 RI E3.S 3.84TB
Networking	BCM57504 NetXtreme-E (10Gb-200Gb) Ethernet
OS	Ubuntu 24.04.1 LTS

Figure 4. Table of Hardware Configuration Details for the Legislative Bill Analysis Solution.

| Solution Details and Workflow

Let’s explore the core of this solution: an agentic retrieval-augmented generation (RAG)* workflow that automates critical components of the legislative bill analysis process. Agentic RAG enhances traditional RAG by integrating autonomous agents capable of breaking down complex tasks, maintaining contextual awareness, and executing specific sub-tasks while collaborating effectively. This advanced approach is essential for evaluating multiple dimensions of proposed legislation—such as legal compliance, economic impact, and social or environmental effects—while incorporating human oversight to ensure precision. Below, we detail the step-by-step data flow within this system:

Figure 5: Solution Workflow.

Bill Upload and Preprocessing: Users begin by uploading draft bills or legislative documents into the system. During preprocessing, the system identifies critical sections requiring amendments and extracts relevant bullet points summarizing the bill’s purpose and provisions. This stage also incorporates document chunking, semantic segmentation, and metadata extraction to enhance retrieval efficiency and performance.
Contextual Retrieval via Vector Database: The system leverages a vector database (VectorDB) to store and retrieve legislative context, such as California Code documents, facilitating the comparison of the bill’s provisions against existing regulations. This contextual retrieval enables AI agents to accurately detect overlaps, conflicts, or areas requiring amendments. Vector embeddings, generated using the industry-standard embedding model bge-large-en, capture complex semantic relationships to ensure precise and relevant retrieval. The R7725’s support for up to 6TB of DDR5 memory at speeds of 6000 MT/s ensures high-speed vector operations and database queries, enabling seamless real-time retrieval of relevant legislative context.
Agentic RAG Analysis: AI agents are deployed to focus on various dimensions of the bill, specifically legal, economic, and environmental impacts. These agents leverage the retrieved context to analyze the documents thoroughly. Each agent maintains its own memory state and reasoning chain, allowing for persistent context awareness throughout the analysis process. The agents communicate through a structured message-passing protocol, enabling collaborative analysis when issues span multiple domains. Each agent generates targeted insights for its domain—legal, economic, or environmental—which are then synthesized into actionable recommendations. The PowerEdge R7725 enhances the performance of these specialized AI agents by accelerating AI operations, with support for up to 2 double-width GPUs.
Comprehensive Report Generation: Once all agents complete their tasks, the system leverages an SLM, Llama 3.2 3B, to compile these findings and generate a detailed report, which includes a summary of the bill, key insights, and recommendations for legal, economic, and environmental considerations with specific references to the original sections in the draft bill. The report generation process implements a hierarchical summarization strategy, first consolidating agent-specific findings before synthesizing them into a cohesive narrative. The final result is a fully synthesized report that aids lawmakers in making informed decisions. This efficient report generation and summarization is driven by AMD EPYC 9755 Processors, providing the necessary computational resources for complex language tasks with their high core count and large memory capacity.

This enhanced multi-agent RAG architecture significantly improves upon traditional single-agent RAG systems by enabling more comprehensive and nuanced analysis of complex legislative documents while maintaining scalability and reliability. The Dell PowerEdge R7725 with AMD EPYC 9755 processors, with its exceptional performance and flexibility, serves as an ideal foundation for this advanced legislative bill analysis solution.

| Solution Architecture

Figure 6. Solution Architecture.

The software stack incorporates the following key components to power this solution:

vLLM (v0.5.3.post1): An industry-standard library for optimized serving of open-source large language models (LLMs), featuring support for AMD ROCm 6.1.
llama-deploy: An async-first framework designed for building, iterating, and deploying multi-agent systems in production.
Llama 3.2 3B Model: A leading open-weight small language model with three billion parameters, served using vLLM with AMD ROCm optimizations for enhanced performance.
LlamaIndex: A widely-used open-source retrieval-augmented generation (RAG) framework.
bge-large-en Embeddings Model: A top-ranked text embeddings model accessible through Hugging Face APIs, known for its semantic accuracy.
MilvusDB: An open-source vector database offering high-performance embedding and similarity search capabilities.

| Summary

This implementation demonstrates how government organizations can overcome the challenges of processing large volumes of legislative bills by utilizing small language models (SLMs) and agentic retrieval-augmented generation (RAG) techniques. This solution automates critical tasks such as data collection, verification, and summarization, streamlining legislative bill analysis. By employing intelligent AI agents that operate semi-autonomously, this approach delivers timely insights and is perfectly suited for offline, batched workloads.

Powered by the Dell PowerEdge R7725 server featuring 5th Gen AMD EPYC 9755 128-Core processors, this agentic RAG-based solution highlights the capability of modern CPU-based infrastructure to efficiently manage advanced AI workflows. By operating exclusively on AMD EPYC processors, the solution enables organizations to leverage existing infrastructure, eliminating the need for expensive hardware upgrades. Additionally, it supports mixed workloads, providing enhanced versatility. This implementation has demonstrated scalability across diverse industries, extending its utility beyond legislative insights and showcasing the wide applicability of CPU-based AI solutions.

“The advantage of AI agents presents an unparalleled productivity opportunity, enabling AI to operate independently to accomplish tasks with minimal human intervention. With AI agents, we can run agentic RAG workloads or other offline, batched tasks entirely on CPUs, optimizing resource utilization and also significantly enhancing efficiency and scalability.”

Chetan Gadgil, CTO at Metrum AI

To learn more about this solution, check out our repository: https://github.com/metrum-ai/genai-legislative-insights.

| References

AMD images: AMD.com, AMD Partner Resource Library, https://www.amd.com/en/partner/resources/resource-library.html
Dell PowerEdge R7725 Rack Server [Image]. Retrieved from https://www.dell.com/en-us/shop/dell-poweredge-servers/new-poweredge-r7725-rack-server/spd/poweredge-r7725/pe_r7725_tm_vi_vp_sb
Microsoft Research. 2023. “Phi-2: The Surprising Power of Small Language Models.” Microsoft Research Blog, December 12, 2023. https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

| Addendum

| Appendix A: Key Concepts

This solution leverages the following technical concepts:

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) combines retrieval-based and generative NLP models to produce accurate, contextually relevant outputs by incorporating external knowledge from a database or corpus. Our legislative bill analysis solution leverages RAG to dynamically retrieve and integrate relevant legal, economic, and environmental data from uploaded code documents to generate detailed, fact-based summaries and insights. This ensures the solution provides timely, accurate analysis, streamlining decision-making processes for government agencies.

AI Agents and Agentic Workflows

AI agents are autonomous software tools that process information, make decisions, and take actions to achieve specific goals using techniques like machine learning and natural language processing. In agentic workflows, multiple AI agents collaborate, each with specialized roles, to break complex tasks into smaller steps for more accurate and efficient execution.

Our legislative bill analysis solution exemplifies this approach, employing AI agents for distinct tasks like legal, economic, and environmental impact assessments. These agents iteratively refine their outputs, ensuring detailed and reliable insights that support decision-making while streamlining the analysis process for government agencies.

Copyright © 2025 Metrum AI Inc. All Rights Reserved. Metrum AI, the Metrum AI logo, and other trademarks are trademarks of Metrum AI Inc. The analysis in this document was conducted by Metrum AI Inc. and commissioned by Dell Technologies.

Dell Technologies, Dell, Dell PowerEdge, Dell logo, and other trademarks are trademarks of Dell Inc. or its subsidiaries. AMD, AMD logo, AMD EPYC, AMD ROCm, and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Other trademarks may be the property of their respective owners.

DISLAIMER: Performance varies by hardware and software configurations, including testing conditions, system settings, application complexity, the quantity of data, batch sizes, software versions, libraries used, and other factors. The results of performance testing provided are intended for informational purposes only and should not be considered as a guarantee of actual performance.

Metrum AI believes the information in this document is accurate as of its publication date. The information is subject to change without notice.

Legislative & Fiscal Insights | Powered By Agentic RAG With AMD® EPYC™ Processors on Dell PowerEdge™ Servers.