AI-Assisted Student Presentation Evaluation: Transforming Academic Assessment Through Multimodal AI

Powered by a 2-Node Dell PowerEdge™ XE7745 Server Configuration with NVIDIA RTX™ PRO 6000 Blackwell Server Edition GPUs, Broadcom® Thor 2 Ethernet Controllers, & Broadcom® Tomahawk™ 5 Ethernet Switches.

January 2026

| Executive Summary

Generative AI has undermined written assignments as reliable evidence of student learning. Universities must scale oral evaluation to restore assessment integrity, while avoiding additional strain on faculty already burdened by existing responsibilities.

This paper presents an AI-assisted presentation evaluation system that processes recorded student presentations using multimodal AI while preserving faculty oversight of evaluations. Deployed on two Dell PowerEdge XE7745 servers equipped with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, the system enables institutions to:

Process up to 12,000+ recorded oral presentations in 24 hours in a two-node deployment, generating draft evaluations in 2–3 minutes per 8-10-minute presentation.
Preserve faculty authority by requiring instructor review and approval before any evaluation is released to students.
Maintain data sovereignty through on-premises deployment that keeps all student data within institutional control.
Scale linearly by adding infrastructure capacity as enrollment grows.

| Table of Contents

Introduction: The Assessment Crisis

The Challenge

Solution Overview

Solution Architecture

Performance Insights

Summary

Disclaimers and Attributions

| Introduction: The Assessment Crisis

Generative AI Has Changed Everything

Higher education is facing an existential crisis from the structural breakdown in how learning is evaluated. For decades, universities have relied on written assignments as the primary window into student understanding, assuming that what a student submits is a reasonable reflection of their independent reasoning. Generative AI has shattered that assumption. Even subject-matter experts cannot reliably distinguish AI-generated writing from student work, making written products an increasingly unreliable signal of actual learning.

Generative AI has created a fundamental problem for colleges and universities: instructors can no longer trust written assignments to reflect what students actually understand. When submitted work may not represent a student’s own thinking, instructors lose visibility into how students reason and lose the ability to help them build critical-thinking skills. Written work also stops being a reliable indicator of whether students have genuinely mastered essential material. Faculty try to compensate with more drafts, more checkpoints, and more time spent validating work, but these strategies make evaluation slower, more labor-intensive, and less reliable. The result is a widening gap between what universities aim to teach and what they can confidently assess.

The obvious solution to this wicked problem is simply to ask the students to explain an idea and walk through their reasoning, but the barrier is scale. Oral evaluation takes time, and time is a scarce resource for faculty. In courses with hundreds of students, it becomes infeasible for faculty to evaluate every student’s explanation, and for that reason, oral evaluation has fallen by the wayside in modern universities.

The innovation we present in this solution is scaling oral evaluation by combining instructor judgment with an AI partner in a unified evaluation process. The system processes recorded video presentations, analyzes what students say and show, identifies moments where genuine understanding is demonstrated, and organizes that evidence using instructor-defined criteria. Instructors remain responsible for the final decision: they review the AI’s initial draft, tailor it to educational objectives, and communicate the evaluation to students. The AI partner handles the repetitive and time-consuming steps so faculty can focus on determining whether the student actually understands the material.

This collaboration fundamentally changes the workload and cost structure of evaluating student work. Faculty’s time is protected while making individualized evaluation possible even in large or fully online courses. Consistency improves because the AI applies criteria the same way every time before faculty reviews and finalizes the results. Most importantly, universities regain a reliable way to see what students actually know, restoring confidence in both the learning process and the meaning of course outcomes.

The Scale Barrier

Implementing scalable oral evaluation requires infrastructure capable of supporting the demands of a faculty–AI collaboration. To make this approach viable at scale, the collaborative design must operate quickly, securely, and in parallel across thousands of submissions while keeping student data under institutional control and compliant with regulation.

This whitepaper introduces an AI-assisted presentation evaluation system designed to transform academic assessment through intelligent automation. The platform combines multimodal AI with enterprise-grade infrastructure, including Dell PowerEdge XE7745 rack servers, NVIDIA RTX PRO 6000 GPUs, as well as Broadcom Thor 2 and Broadcom Tomahawk 5 networking, to enable secure, on-premises evaluation at institutional scale.

| The Challenge

While generative AI has disrupted traditional written assessment, the deeper challenge for universities is obtaining reliable evidence of student learning. Large, multi-college universities must demonstrate what students actually understand across hundreds of programs, modalities, and degree types. Yet the diversity that defines modern higher education makes consistent, reliable assessment extremely difficult to achieve at scale.

A university is not a single-domain environment. It is an ecosystem of disparate disciplines—engineering, business, health sciences, arts, and humanities, each with its own communication norms, assignment structures, and expectations for demonstrating understanding. This diversity is academically necessary, but operationally challenging: a one-size-fits-all approach cannot capture or evaluate learning across such varied programs.

This challenge is magnified by the expectations placed on universities. Program reviews, accreditation cycles, and external stakeholders increasingly require credible, auditable evidence of learning quality. Institutions must show that course outcomes are meaningful and consistently interpreted across sections, departments, and campuses. Yet achieving consistency across such variation requires solutions that accommodate disciplinary differences rather than override them or encroach on academic freedom.

AI changes this equation. Modern multimodal AI models, and the environments in which they are embedded, are inherently domain-agnostic—they can analyze communication, reasoning, organization, and clarity regardless of disciplinary context. When paired with instructor-designed rubrics and faculty oversight, AI becomes a scalable evaluation partner capable of handling institutional diversity without enforcing uniformity. It supports consistent processing across thousands of submissions while allowing each program to retain its own pedagogical identity.

For universities seeking to regain a reliable view of student understanding across their full range of programs, AI-assisted evaluation, and in this case evaluation of oral presentations, represents one promising path forward. It offers a way to generate credible, comparable, and auditable evidence of learning at the scale and speed that institutional operations require, while preserving the autonomy and expertise of faculty.

Challenge and Solution at a Glance

Challenge	Solution
Generative AI undermines written assessment	Oral evaluation reveals authentic understanding
Manual grading doesn't scale	AI processes thousands of presentations in hours
Cloud AI creates data sovereignty risks	On-premises deployment with complete institutional control
Inconsistent evaluation across sections and modalities	Rubric-aligned AI ensures consistent criteria application

Table 1 | Challenge and Solution Summary

| Solution Overview

This solution scales oral evaluation by combining instructor judgment with an AI partner in a unified evaluation process. The system processes recorded video presentations, analyzes what students say and show, identifies moments where genuine understanding is demonstrated, and organizes that evidence using instructor-defined criteria.

Rather than replacing instructor judgment, the platform serves as an evaluation support tool. It assists with content extraction, rubric alignment, and draft feedback generation, while ensuring faculty approval before any evaluation is finalized or released.

Core Operations

At a functional level, the platform performs three observable operations:

Extracts multimodal artifacts from recorded presentations (including audio transcripts, slide content, visual elements)
Applies instructor-designed rubrics to extracted content using AI-powered evaluation
Surfaces draft evaluations and supporting evidence for faculty review and approval

Each operation produces intermediate outputs that can be inspected, logged, and audited by instructors or administrators.

From Raw Presentations to Structured Artifacts

Each student submission is treated as a multimodal input consisting of video, audio, and visual content. The system processes these components independently and stores the extracted artifacts for subsequent evaluation and review.

Figure 1 | Multimodal Presentation Processing Pipeline

Audio tracks are extracted from uploaded video files and transcribed into timestamped text. Visual frames are extracted at detected slide boundaries and analyzed to identify visible text and layout structure. These outputs are stored as discrete artifacts rather than combined into a single opaque representation.

This multimodal separation is essential for academic transparency and defensibility. Unlike black-box AI systems, instructors can trace every evaluation claim back to specific evidence: a timestamp in the transcript, a visible slide element, or a structural pattern. When a student contests a score, faculty can review the exact evidence chain rather than justifying an opaque algorithmic decision.

Before generating draft evaluations, the system performs two critical preprocessing steps:

Content Alignment: The system correlates spoken content with visual evidence, mapping transcript segments to corresponding slides. This speech-slide mapping ensures that evaluation rationales can reference both what students said and what they showed.

Guardrails Validation: The system verifies that evaluations fall within appropriate ranges, evidence citations reference actual transcript segments or slides, rationales meet minimum length requirements for substantive feedback, and overall evaluation is complete. The system assesses confidence for evaluation quality indicators. Institutions can configure thresholds to flag evaluations requiring additional faculty review, ensuring human oversight for ambiguous or challenging assessments.

Rubric-Aligned Draft Evaluation

Evaluation criteria are defined by instructors through configurable rubrics associated with each assignment. Rubrics specify criteria names, scoring ranges, weights, and descriptive guidance. These configurations are stored and versioned within the system.

Figure 2 | Example Rubric Configuration and Learning Objective Definition

Using these rubric definitions, the system generates draft, criterion-level evaluations by analyzing the previously extracted artifacts. For each criterion, the system produces:

A proposed score within the defined range
A text rationale referencing extracted content
Evidence pointers linking the rationale to transcript timestamps or slide identifiers

These draft evaluations are explicitly flagged as preliminary throughout the system workflow. No score, comment, or feedback becomes visible to students until an instructor has reviewed, modified if necessary, and approved each evaluation through an explicit action logged by the system.

Faculty Review & Approval

All draft evaluations are presented to faculty within a review workspace that aggregates relevant evidence and generated outputs.

Figure 3 | Faculty Review Workspace with Evidence and Draft Evaluation

Faculty can:

Review the original submission alongside extracted artifacts.
Inspect evidence references used in draft rationales.
Modify scores, rationales, and qualitative feedback.
Approve or reject draft evaluations before release.

The system records the AI-generated draft, all faculty edits and overrides, and the final approved evaluation state. This data is retained as part of the system's audit trail and can be queried for administrative review, quality assurance, or research purposes.

Student Experience

While the system primarily reduces faculty workload, students also benefit significantly from AI-assisted evaluation:

Faster feedback turnaround: Students receive detailed, criterion-specific feedback within days rather than weeks—accelerating the learning cycle and enabling meaningful revision before subsequent assignments.
Detailed, actionable comments: AI-generated drafts provide specific evidence references, helping students understand exactly where they demonstrated understanding and where they may have fallen short.
Consistent evaluation standards: Rubric-aligned evaluation ensures all students are assessed against the same criteria, reducing concerns about grading fairness across sections.
Transparency into evaluation reasoning: Evidence pointers allow students to see exactly which moments in their presentation supported each score, demystifying the evaluation process.

By shifting faculty time from watching videos to reviewing AI-prepared drafts, instructors can invest more energy in qualitative feedback, student mentoring, and curriculum refinement—improving the overall educational experience.

Scalability Model and Evaluation Scope

The system is designed to scale by separating automated processing from faculty-controlled evaluation actions.

Automated stages, including video ingestion, audio transcription, slide extraction, and rubric-based draft evaluation, execute asynchronously after student submission and complete without faculty interaction. These stages process submissions in the background and update assignment status as evaluations become available for review.

Faculty interaction is intentionally limited to rubric design, review, override, and approval steps. Instructors inspect AI-generated scores and feedback, make edits where necessary, and explicitly approve evaluations before results are released to students or synchronized with the LMS. This approval step is required for every submission and is recorded by the system.

In practice, this separation enables realistic academic workflows. A faculty member teaching a 200-student section can assign a presentation, and the automated pipeline processes submissions overnight, generating draft evaluations by the following morning. The instructor then reviews each evaluation in 2-3 minutes rather than watching 20+ minute presentations in full, making individualized oral assessment operationally feasible even in large enrollment courses.

This design allows automated processing capacity to scale with available infrastructure without altering instructional responsibility or grading authority. The performance characteristics of the automated pipeline, including throughput scaling behavior and concurrent processing capacity are detailed in the Performance Insights section below.

| Solution Architecture

The AI-assisted presentation evaluation system is implemented as an on-premises, containerized application stack that integrates multimodal AI inference, workflow orchestration, and faculty-facing review interfaces within institution-controlled infrastructure.

Figure 4 | End-to-End Solution Architecture

Architectural Layers

As shown in Figure 4, the architecture is organized around three primary layers:

User Interaction Layer: Role-specific portals for administrators, professors, and students.
Application and Orchestration Layer: Workflow management, job scheduling, and service coordination.
AI Inference and Infrastructure Layer: GPU-accelerated model serving and data processing.

This separation allows automated processing to scale independently while preserving explicit academic control points.

Logical Architecture and Responsibilities

Table 2 below summarizes the primary architectural layers shown in Figure 4 and their corresponding responsibilities. Each layer maps directly to labeled components in the diagram and to the observable behavior in the demo workflow.

Layer	Key Components	Responsibility
User Interaction	Admin/Professor/Student Portals	Provides role-specific interfaces for managing courses, reviewing evaluations, and accessing feedback
Authentication	OAuth2 Proxy, Supabase, Keycloak	Verifies user identity and enforces role-based access rules
API Gateway	NGINX	Directs requests to appropriate backend services
Workflow Orchestration	Celery, Valkey, Distributed Job Scheduler	Manages asynchronous processing of submissions and coordinates evaluation tasks
Evaluation Control	System Prompt Engine, Guardrails Engine	Applies instructor-defined rubrics and enforces evaluation constraints
Data Management	Supabase Database	Stores submissions, extracted artifacts, drafts, and final approved evaluations

Table 2 | Logical Architecture and Component Responsibilities

This structure allows automated processing stages to operate independently while preserving explicit faculty oversight at evaluation and release points.

Multimodal AI Inference Pipeline

Automated analysis is performed through a staged multimodal pipeline executed entirely within the on-premises environment. Each stage produces intermediate artifacts that are stored and later surfaced to faculty during review.

Stage	Model	Input	Output	Purpose
Audio Transcription	Whisper	Extracted audio	Timestamped transcript	Captures spoken content
Visual Analysis	Qwen/Qwen3-VL-30B-A3B-Instruct-2507 (Vision)	Video frames	Slides + OCR text	Identifies visual evidence
Content Alignment	Alignment Engine	Transcript + Slides	Speech-slide mapping	Correlates spoken and visual content
Rubric Reasoning	Qwen/Qwen3-30B-A3B-Instruct-2507	Aligned content + rubric	Draft scores + rationales	Applies evaluation criteria
Inference Runtime	vLLM + CUDA 13	Model requests	Parallel responses	Enables concurrent processing

Table 3 | Multimodal AI Inference Pipeline

Security, Privacy, and Compliance

All processing is deployed within institution-controlled infrastructure. Media files, extracted artifacts, draft evaluations, and final assessments are stored on systems administered by the institution.

Authentication and Access Control

Authentication integrates with institutional identity providers via OAuth2 Proxy, ensuring that login credentials never leave university control. Role-based access controls enforce strict separation between student, faculty, and administrative permissions at both the application and database layers.

Once authenticated, users access only the data and functions appropriate to their role:

Students see their own submissions and approved feedback
Faculty see submissions and draft evaluations for their assigned courses
Administrators access aggregate data and system health metrics without visibility into individual student work

Data Sovereignty and Compliance

At the infrastructure level, all components, application servers, inference engines, and data stores, run on institution-owned hardware within university data centers. No student video, transcript, or evaluation data is transmitted to external services, cloud providers, or third-party APIs.

This on-premises deployment model supports compliance with FERPA, institutional data governance policies, and international data sovereignty requirements without requiring custom legal agreements or external certifications.

Audit Trail for Accreditation

The system maintains a complete audit trail of all evaluation activity. Quality assurance reviews can identify systematic patterns, accreditation processes can document evaluation consistency across sections, and academic integrity investigations can reconstruct the complete assessment history for any submission.

Infrastructure and Deployment

The solution is deployed entirely on institution-controlled, on-premises infrastructure designed for high- performance AI workloads, as shown at the base of Figure 5. Table 4 summarizes the deployment stack used in the demo and reference implementation.

Layer	Technology	Function
Operating System	Ubuntu 24.04	Base OS for all services
Containerization	Docker Compose	Isolates and deploys application and inference services
GPU Acceleration	NVIDIA RTX PRO 6000 Blackwell Server Edition	Executes parallel AI inference workloads
Compute Platform	Dell PowerEdge XE7745 (2 nodes)	Hosts application and inference services
Network Interface	Broadcom Thor 2 (BCM57608) Ethernet Controllers	High-bandwidth server connectivity
Network Switch	Dell PowerSwitch Z9864F-ON	Connects multi-node deployment with low-latency switching
Switching Fabric	Broadcom Tomahawk 5 Ethernet Switches	Powers high-bandwidth data movement between nodes

Table 4 | Infrastructure and Deployment Stack

System Operational behavior, including service health, task execution, and inference activity, can be inspected through logs and dashboards, as demonstrated in the demo environment.

| Performance Insights

Inference Throughput

Performance analysis began with LLM throughput benchmarking using the Qwen3-30B-A3B-Instruct model. Testing was configured at 2048 input and 2048 and 512 output tokens to characterize token generation capacity for both single node and multi-node configurations of the Dell PowerEdge XE7745 servers.

Figure 5 | Single-Node Inference Throughput vs. Concurrency

Figure 5 shows inference throughput, measured in output tokens per second, as concurrency increases on a single Dell PowerEdge XE7745 node equipped with 8 NVIDIA RTX PRO 6000 GPUs. Throughput increases from 1,207 tokens/second at 32 concurrent requests to 6,155 tokens/second at 1024 concurrent requests and 2048 Output Tokens, demonstrating how parallel request execution scales within a fixed hardware configuration.

These measurements establish the baseline inference capacity available for the multimodal AI pipeline that generates draft evaluations.

Figure 6 | Multi-Node Inference Throughput vs. Concurrency

Figure 6 presents the same inference throughput measurements with workloads distributed across two XE7745 nodes (16 total GPUs). Peak throughput reaches 11,830 tokens/second at 1024 concurrent requests and 2048 Output Tokens, approaching double the single-node baseline and demonstrating near-linear scaling when inference capacity is added.

Together, Figures 5 and 6 demonstrate how automated inference capacity changes when scaling from a single-node to a multi-node deployment.

Application Level Throughput and Multi-Node Scaling

The performance results presented in this section characterize the automated processing pipeline of the AI-assisted presentation evaluation system. Measurements focus on background processing stages, including transcription, multimodal analysis, and rubric-based draft evaluation, that execute prior to faculty review.

Figure 7 | Application-Level Evaluation Throughput Under Concurrent Submissions. *During benchmarking, a build of using Gemma-3-27b for OCR operations was evaluated.

Figure 7 translates inference-level performance into application-level behavior by measuring completed evaluations per hour and total video hours processed as concurrent submissions increase. At 96 concurrent videos, the single-node system completes 254 evaluations per hour while processing 35.3 video hours per clock hour.

This means the system processes video content significantly faster than real-time playback. For a typical 10-minute student presentation, the automated pipeline—from ingestion through draft evaluation generation—completes in approximately 2-3 minutes of wall-clock time when operating at measured capacity. This processing speed enables overnight batch processing of large submission volumes, ensuring draft evaluations are ready for faculty review by the following morning.

Figure 8 | Application-Level Evaluation Throughput: Single-Node 150 Concurrent Videos vs. Multi-Node at 300 Concurrent Videos. *During benchmarking, a build using Gemma-3-27b for OCR operations was evaluated

Figure 8 compares end-to-end automated evaluation throughput between single-node and multi-node deployments. The multi-node configuration delivers 517.5 evaluations per hour (68.4 video hours processed per clock hour)—a 76% increase over the single-node baseline of 294.3 evaluations per hour.

This performance scaling allows institutions to match infrastructure capacity to their assessment workload. A single-node deployment can support multiple concurrent courses with hundreds of students each, while institutions with higher simultaneous demand—such as synchronized assignment deadlines across many sections—can deploy multi-node configurations for proportionally higher throughput.

Operational Implications

The performance characteristics above translate into practical institutional capacity:

Single-node deployment: Processes 6,000+ evaluations in a 24-hour period, supporting large individual courses or multiple medium-sized courses with staggered deadlines.
Multi-node deployment: Processes 12,000+ evaluations in a 24-hour period, accommodating simultaneous deadline surges across many courses or supporting institution-wide assessment initiatives.

These throughput levels enable automated processing to stay ahead of faculty review capacity. Even with hundreds of concurrent submissions, draft evaluations become available for instructor review within hours rather than days, maintaining rapid feedback cycles regardless of submission volume.

Key Takeaways

Single-node performance scales efficiently: Processing capacity more than triples between baseline and peak concurrency.
Multi-node deployment delivers near-linear scaling: Adding infrastructure provides proportional capacity gains.
Infrastructure performance directly impacts academic operations: Higher throughput enables overnight batch processing, ensuring drafts are ready for faculty review by morning.
Faculty review remains outside the performance-critical path: Instructors can review at their own pace without affecting system throughput.

Institutional Benefits

Scalability without proportional staffing: Institutions can expand oral assessment across more courses without hiring additional grading staff or overloading existing faculty.

Consistency for accreditation: Rubric-aligned evaluation provides auditable evidence of consistent standards across sections and modalities—valuable for program reviews and accreditation cycles.

Reduced reliance on cloud AI: On-premises deployment eliminates per-API-call costs and data privacy concerns associated with cloud-based AI services.

Future-ready assessment infrastructure: The same platform can support additional AI-assisted workflows as institutional and pedagogical needs evolve.

| Summary

AI-assisted presentation evaluation represents a transformative opportunity for universities to scale assessment capabilities while improving consistency, reducing turnaround time, and maintaining rigorous academic standards. To realize these benefits, sophisticated AI models must be paired with thoughtful faculty oversight, backed by enterprise-grade infrastructure and comprehensive security.

The solution presented in this whitepaper combines cutting-edge multimodal AI processing with proven Dell, NVIDIA, and Broadcom infrastructure to deliver a complete, production-ready evaluation system. By deploying on premises with Dell PowerEdge XE7745 servers equipped with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs and connected through high-performance Broadcom Thor 2 network adapters, Dell PowerSwitch, and Tomahawk 5 switching fabric, institutions maintain complete control over sensitive student data while achieving the scale and performance required for large academic programs.

The architecture ensures AI serves as an intelligent assistant rather than a replacement for instructor judgment. Faculty reclaim time previously spent on repetitive analysis and evidence gathering, allowing them to focus on qualitative feedback, student mentoring, and curriculum refinement. Students benefit from rapid, detailed feedback that accelerates learning and improvement.

As universities worldwide seek to leverage artificial intelligence responsibly and effectively, this AI-assisted presentation evaluation system provides a proven pathway forward—one that delivers measurable operational benefits while upholding the standards of academic rigor, fairness, and security that higher education demands.

About Metrum AI

Metrum AI delivers enterprise-grade artificial intelligence solutions that combine cutting-edge machine learning with proven infrastructure partnerships. Our platform specializes in multimodal AI processing for complex organizational workflows, enabling automation while maintaining human oversight and control.

For more information or to discuss deployment in your organization, contact us at contact@metrum.ai

About Jonathan Kalodimos, PhD

Jonathan Kalodimos is an Associate Professor of Finance and Harley & Brigitte Smith Fellow in the College of Business at Oregon State University. He holds a PhD in Finance from the University of Washington Foster School of Business.

Dr. Kalodimos’ work integrates empirical research, policy experience, and applied analysis. Before joining Oregon State University, he served as a financial economist at the U.S. Securities and Exchange Commission, where he contributed to regulatory initiatives including Dodd-Frank Act Section 954 on executive compensation clawbacks. His research on corporate governance, regulatory design, and financial decision-making has been cited in major outlets such as The Wall Street Journal, The New York Times, Bloomberg, and the Harvard Business Review, reflecting its relevance to both academic and industry audiences.

Dr. Kalodimos brings a deep commitment to rigorous, evidence-based analysis and the practical application of data and technology to complex institutional challenges.

Disclaimers and Attributions

This work is jointly authored, and each copyright holder retains the right to use, reproduce, distribute, and create derivative works from this material without requiring permission from the other copyright holders. Jonathan Kalodimos and Metrum AI may distribute or reproduce this work for academic, research, instructional, commercial, or marketing purposes, provided proper attribution is maintained.

This project was commissioned by Dell Technologies. Dell, PowerEdge, and other trademarks are trademarks of Dell Inc. or its subsidiaries. NVIDIA and RTX are trademarks of NVIDIA Corporation. Broadcom, Thor, and Tomahawk are trademarks of Broadcom Inc. All other product names are the trademarks of their respective owners.

Performance Disclaimer: Performance varies depending on hardware and software configurations, including testing conditions, system settings, application complexity, data volume, batch sizes, software versions, libraries used, and other factors. The results of performance testing provided are for informational purposes only and should not be considered a guarantee of actual performance. During benchmarking, a build of the application using Gemma-3-27b for OCR operations was evaluated. The results showed ample overhead for more compute-intensive analysis, leading to the decision to upgrade the simple OCR process to a full Vision Language Model analysis using Qwen3-VL-30B-A3B-Instruct in the current build.

Glossary

Term	Definition
FERPA	Family Educational Rights and Privacy Act. U.S. federal law protecting the privacy of student education records.
Guardrails Engine	System component that validates content completeness and alignment with academic standards before evaluation.
LLM	Large Language Model. AI models trained on vast text corpora, enabling natural language understanding and generation.
LMS	Learning Management System. Software platform for delivering, tracking, and managing educational courses (e.g., Canvas, Blackboard).
Multimodal AI	AI systems that process multiple types of input (text, audio, images, video) simultaneously.
OAuth2	Industry-standard protocol for authorization, enabling secure integration with institutional identity providers.
OCR	Optical Character Recognition. Technology that extracts text from images or video frames.
On-Premises	Deployment model where all hardware and software runs within institution-controlled data centers rather than external cloud services.
Rubric	Structured evaluation criteria defining scoring ranges, weights, and descriptive guidance for assessment.
Valkey	Open-source in-memory data store used for job queuing and task coordination.
vLLM	High-throughput inference engine optimized for serving large language models.

Table 8: Glossary of Terms