December 23, 2025

From Static Models to Autonomous Agents: How AI Risk Management Must Evolve

From Static Models to Autonomous Agents: How AI Risk Management Must Evolve

Over the last decade, financial institutions have developed robust model risk management programs under frameworks like SR 11-7 and SS1/23. These standards were built for a world of static models whose behavior could be documented and validated with reasonable certainty. The rapid emergence of agentic AI systems, however, is rewriting those assumptions.

Large language models (LLMs) that plan and take autonomous action have a hard time aligning within the traditional MRM paradigm which means current frameworks are no longer enough. AI risk management have to evolve just as rapidly as the technology itself. Understanding why these frameworks fall short requires first examining what they assume, how agentic systems differ, and what risk management has to add as a result. 

The Limits of Today’s MRM Frameworks

Frameworks like SR 11-7 and SS1/23 have served institutions well by establishing disciplined controls around traditional models, including linear regressions, scorecards, and supervised machine learning systems. These frameworks assume that model behavior can be fully specified in advance. Inputs, assumptions, and performance characteristics are expected to remain stable unless a model is explicitly changed. Validation, in turn, is retrospective, focusing on conceptual soundness and implementation accuracy.

These assumptions begin to break down with agentic AI systems. LLMs are very context dependent. More importantly, agentic systems generate predictions and execute sequences of decisions and actions in live environments. Autonomy, tool-use, and adaptive behavior introduce emergent risks that cannot be captured through static documentation or validation.

Take a Deeper Dive into AI Risk: How to Proactively Manage AI Risk Before Regulators or Auditors Ask

What Makes Agentic Systems Fundamentally Different

These limitations stem from a fundamental mismatch. While large language models underpin many agentic systems, the risk profile emerges at the system level. Agentic systems therefore differ from traditional models in their ability to reason and take autonomous actions across dynamic environments.

Rather than producing a single prediction or score, agents are designed to reason, plan, and execute multi-step tasks. They can turn objectives into intermediate goals, adapting their approach halfway if needed, and change their behavior based on feedback from their environment. The system’s effective “policy” is shaped continuously by context and real-time inputs.

These systems are also integrated with tools. An agent may call APIs, generate and execute code, update records, or trigger workflows, creating direct operational and regulatory exposure. Inputs themselves are fluid and often include live data, documents, and user instructions. This openness introduces new failure modes that fall outside the assumptions of legacy MRM frameworks.

Why SR 11-7 and SS1/23 Are Insufficient for Agents

At their core, SR 11-7 and SS1/23 are model focused frameworks applied to systems that no longer behave like models. The characteristics that define agentic AI systems expose structural gaps in existing regulatory frameworks. SR 11-7 and SS1/23 emphasize conceptual soundness and outcome testing but they do not address how to validate behavior in action sequences that unfold over time. Similarly, existing controls do not anticipate autonomy or tool use. Traditional models are not expected to invoke APIs or trigger transactions without human intervention yet these capabilities are central to many agentic deployments.

Documentation requirements also fall short. Static model descriptions cannot fully capture systems whose behavior depends on live context and external, evolving data. Performance testing assumptions further break down as agents operate in non-stationary environments, requiring continuous and adaptive evaluation. Governance frameworks remain focused on individual models, offering limited guidance for managing system level risk across retrieval components and external tools.

Read a previous piece on governance: Moloch’s AI Game: Why Governance is the Ultimate Accelerator

What AI Risk Management Need to Add

Governance frameworks are required to explicitly define the system boundary, recognizing tools and actions as part of the validated scope. To govern agentic AI safely, model risk management needs to expand beyond individual models to encompass entire AI systems. Validation must become continuous and behavioral, incorporating scenario based testing with adversarial prompts and ongoing monitoring for policy drift.

Risk management must also incorporate specific capabilities and safety evaluations, including alignment testing, jailbreak resistance, refusal behavior, and detection of harmful actions. Documentation should evolve, replacing static model reports with living system cards that update as configurations or underlying models change. Strong guardrails are also critical and include policy level constraints on agent behavior and human-in-the-loop controls for high-impact decisions. Finally, governance frameworks should address vendor and model lifecycle risk in modern AI systems.

From Models to Systems: A Governance Mismatch

Agentic AI systems represent a fundamental shift from tools that calculate to systems that act more like digital co-workers. Frameworks like SR 11-7 and SS1/23 remain essential however they are insufficient on their own. To manage the autonomous and adaptive risks of AI, organizations have to broaden governance to address behavior, autonomy, and system level decision making. Institutions that evolve their risk management practices now will unlock AI’s benefits responsibly.

Company and Industry Updates, Straight to Your Inbox