Throughout history, many human inventions have been democratized as technology has advanced, from the wheel to the printing press, to the smartphone. More recently, advances in the areas of computing power, programming capabilities, and data/information availability have resulted in a proliferation in the development and use of mathematical models in conventional applications like trading, credit scoring, and medical statistics, as well as in newer applications such as large language models, facial recognition technologies, and self-driving cars.
As with any other democratizing trend, this provides incredible opportunities for organizations, governments, and academia regardless of size, to use models in their operations, exploiting the true potential of big data and massive computing power to improve predictability and consequently, profitability and/or public good.
However, this explosion of the use of models also brings about an accompanying increase in model risk, defined here as the consequences of using inaccurate models or of misusing models, across all domains of application. At the highest level, we can distinguish four key categories of model risk challenges.
Key Model Risk Challenge #1: Conceptual Soundness
A model is considered conceptually sound if the model is valid, and its premises are true. A modeler can take data and create a model that fits the data to show a relationship between two variables. The challenge of this approach, sometimes known as dustbowl empiricism — making empirical observations and collecting data rather than establishing a theoretical framework — is that while a model may fit the data, it may not make sense. There may not be any causal relationship or it may just be a spurious correlation.
To quote a famous illustration that correlation is not causation – for a period of ten years between 1999 and 2009, deaths by drowning by falling into swimming pools highly correlated with the number of Nicholas Cage movies released that year. Does anyone really believe that one of these causes the other? Similarly, analysis of the data and the patterns therein (something machine learning does very well) may add to the modeler’s understanding – i.e., two variables that appear spuriously correlated could be caused by a third factor.
Having the expertise to know the difference, is therefore critical to managing model risk.
Key Model Risk Challenge #2: Data Validity
Data used in modeling can be subject to errors, missing data, outliers, and anomalies. While, ideally, data used to build the model should be fit for purpose and collected appropriately, this often proves challenging since the modelers don’t always have control over the upstream data feeds. As a result, modelers resort to various types of data “enrichment” such as cleansing, smoothing and removal of outliers. The overall result is that a model may be prone to various biases, caused by either flawed data (historical bias, representation bias, response bias) or flawed enrichment by the modelers (confirmation bias, cognitive bias).
A notable instance of representation bias was illustrated in the news by Amazon where a hiring screening algorithm for technical jobs was found to systematically discriminate against women. The root cause of this bias was a model that had been trained on past and current populations of technical hires that were primarily composed of men.
Therefore, appropriately designing the data, understanding the limitations within the underlying data, and mitigating the biases introduced in data remediation or enrichment is critical to managing model risk.
Key Model Risk #3: Ethical Impact
Widespread uncontrolled use of modeling can also result in ethical issues – either deliberate disinformation or inadvertent bias, both of which can adversely impact human beings whether at the individual or social level.
Notable examples include significantly poorer accuracy for certain races, genders, and age groups in facial recognition technologies, and racial bias resulting from recidivism algorithms in the criminal justice system.
Therefore, ensuring models do not introduce deliberate or inadvertent harm through the use of continuous monitoring and outcomes analysis is critical to managing model risk.
Key Model Risk #4: Governance and Accountability
The proliferation of models and algorithms poses a new risk, that of reduced or non-existent accountability. In the era of traditional statistical models confined to narrow systems of application, there was a degree of accountability in that inaccurate models were the responsibility of the modeler, and inappropriate use of models was the responsibility of the user.
By contrast, when complex models and algorithms are widely spread in everyday applications, such as large learning models, it becomes incredibly challenging for any single individual or entity to understand the model and its ecosystem in its entirety. There is therefore no clear system-level accountability.
For this reason, an end-to-end understanding and governance of an organization’s use of models, starting from upstream data to the use of vendor-provided or externally developed models and algorithms to the end-use of every model, is critical to managing model risk.
While the democratization of mathematical modeling has led to exciting opportunities, it is essential to acknowledge the challenges that come with it and to implement effective measures to manage model risk. The sooner institutions get started on strengthening their model risk management framework to adequately address the issues of conceptual soundness, data validity, ethical considerations, and governance and accountability, the sooner they will be able to get ahead of potential risks and ensure models deliver the most value for their business.
Author: Saee Joshi (CFA, FHCA) and Mehdi Esmail © Copyright 2023 ValidMind Inc.
This post was previously published on LinkedIn.