AIR Currents

Mar 16, 2010

Editor’s Note: Since catastrophe models were first introduced in the late 1980s, the increase in computing power by several orders of magnitude and the relentless pace of scientific research has made possible dramatically more sophisticated simulation processes. The growth in detail and complexity of the models, as well as in the sheer number of parameters, may give the sense that models should be more precise than ever. While catastrophe risk management has undoubtedly come a long way, translating model results into informed decision making requires a balanced understanding of uncertainty in model assumptions and parameters, and a judicious awareness of the limitations of modeling. In this article, Dr. Jayanta Guin, AIR Senior Vice President, Research, will introduce some key concepts and answer the fundamental question: What is uncertainty, where does it come from, and how is it treated in models? In the upcoming months, the discussion will continue with articles focused on uncertainty in hurricane and earthquake loss estimation.

Understanding Uncertainty

Real world systems are immensely complex, and models that attempt to simulate them are essentially simplified mathematical representations of physical phenomena. This process of simplification introduces both possibilities and pitfalls. Probabilistic catastrophe models use science and statistics to make sense of seemingly random and unpredictable events in nature, allowing us to essentially prepare for the unknowable. On the other hand, how do we know if our models properly describe the physical world? How good is the data used to develop the models and the data input into the models? Have we simplified too much?

All of these concerns fall under the notion of uncertainty, which at a conceptual level can be categorized into two broad types—epistemic (from the Latin root episteme, or knowledge) and aleatory (from the Latin root alea, a game of dice). Epistemic uncertainty results from an incomplete or inaccurate scientific understanding of the underlying process. In theory, as more knowledge or data becomes available, epistemic uncertainty should go to zero. Aleatory uncertainty, on the other hand, is a result of statistical variability. It is attributed to intrinsic randomness and is not reducible as more data is collected for a given model. In practice, the distinction between these two types of uncertainty is not always clear, as there are situations where apparent randomness is actually a result of lack of knowledge.

Whatever its source, uncertainty ultimately imposes limitations on the accuracy of the model’s output. However, uncertainty is not confined to final modeling results. It is present in each component of the modeling framework, both in models and in model parameters (see table below).

Uncertainty: Figure 1

Jay GuinDr. Jayanta Guin

Aleatory, epistemic, primary and secondary—these are all terms commonly used when discussing uncertainty in catastrophe models, but why do they really matter? This article gives a practical overview of how AIR models treat uncertainty and provides some context of what it means in terms of making business decisions based on model output.

How Is Uncertainty Addressed in Catastrophe Models?

In discussing uncertainty, different disciplines prefer different terminology. Scientists apply the terms epistemic and aleatory uncertainty to their understanding of physics-based phenomena. Actuaries and statisticians, who deal with the frequency and severity of potential events, tend to prefer the terms model and parametric uncertainty.

To better conceptualize uncertainty in catastrophe models, which combine complex probabilistic and physical submodels with statistical and actuarial science, it is necessary to introduce yet two more terms—primary and secondary uncertainty. Primary uncertainty refers to uncertainty in the event generation component of the model—in other words, in the event catalog. Secondary uncertainty is uncertainty in the damage estimation. Both types have elements of epistemic/aleatory as well as model/parametric uncertainty.

Uncertainty: Figure 2
Figure 1. Primary uncertainty (including sampling variability) concerns the event generation component of the model, while secondary uncertainty concerns intensity, damage, and loss estimation. Both types have elements of epistemic/aleatory as well as model/parametric uncertainty. (Source: AIR)

Primary Uncertainty

In constructing an event catalog that reliably reflects the potential risk from future events, the main sources of epistemic uncertainty are data quality, data completeness, and incomplete scientific understanding of the natural phenomenon being modeled. The historical record for events that predate modern instrumentation is considerably less reliable, and smaller intensity events are more likely to have gone unrecorded. Furthermore, large intensity events are rare, so relying on the historical record can misrepresent the tail risk from low frequency but high impact events.

To address this uncertainty, AIR scientists and statisticians construct stochastic event catalogs by fitting probability distributions to the historical data for each event parameter (for example, magnitude of an earthquake or minimum central pressure of a cyclone). Due diligence requires reviewing, processing, and validating data from multiple sources. Additionally, where available, geophysical information (such as GPS observations or fault trenching data) is used to supplement historical data.

Using the same set of observable data—but different underlying assumptions for processes that are not directly measurable (such as the time dependency of fault rupture probability, or the link between warm sea surface temperatures and increased hurricane landfall frequencies)—alternate credible views of risk may be possible. AIR offers multiple stochastic catalogs for certain models, such as standard and climate-conditioned catalogs for the U.S. hurricane model, and time-dependent and time-independent earthquake catalogs for the U.S. and Japan. In the absence of a clear consensus in the scientific community, a multiple-catalog approach better captures the most current state of knowledge.

The process of creating event catalogs suitable for practical computational platforms introduces another type of primary uncertainty, called sampling variability, which is associated with catalog size. A catalog with more scenario years (100,000) has inherently less sampling variability than a smaller catalog (50,000 or 10,000 years) because it better reflects the full range of possible outcomes for the upcoming year. While this source of variability can in theory be eliminated by drawing ever larger samples of events, for the purposes of computational efficiency and workflow requirements (a larger catalog translates to longer analyses times), it is desirable to statistically constrain the catalog. AIR uses various techniques to sample a smaller set of events that provide a reasonable approximation of the results obtained using a larger set.

A Simple Example

Consider a fault that generates earthquakes on average once every 10 years. The physics of the fault are perfectly understood such that there is no uncertainty surrounding the average recurrence interval. First, assume that earthquake probability is governed by a Poisson distribution, meaning that the average recurrence interval is known and that events occur independently of the time since the last event. In this case, the only uncertainty is aleatory, resulting from the inherent randomness in the time interval between events. In other words, even though the average time between earthquakes is certain, when the next quake will occur is not knowable.

Suppose instead that there is incomplete knowledge of how this fault behaves physically. With many more observations, it is determined that the average recurrence rate is closer to 12 years than 10. This constitutes epistemic parametric uncertainty, which results from incomplete data and is reduced as more observations become available.

Now, suppose that new evidence suggests that the fault ruptures in clusters of earthquakes within a relatively short period of time. While the long-term average recurrence interval may still hold, a Poisson distribution is not the most appropriate for this particular fault. In this case, there is epistemic model uncertainty in how to properly account for clustering.

Secondary Uncertainty

Secondary uncertainty is the uncertainty associated with the damage and loss estimation should a given event occur. Part of this can be attributed to uncertainty in the local intensity (ground motion or wind speed) of a particular event at a given location. The ground motion prediction equations used in earthquake models and the windfield profiles used in hurricane models are physical and statistical representations of very complex phenomena. Depending on the underlying assumptions, parameters, and the set of data used, different equations (i.e., alternative models) for calculating local intensity are possible, and the choice of which model or models to use constitutes epistemic model uncertainty.

Translating local intensity to building performance is another source of secondary uncertainty. Because actual damage data is scarce, especially for the most severe events, statistical techniques alone are inadequate for estimating building performance. As a result, AIR constructs damage functions based on a combination of historical data, engineering analyses (both theoretical and empirical), claims data, post-disaster surveys, and information on the evolution of building codes. While graphical representations of damage functions typically only show the mean damage ratio, there is actually a full probability surface that allows for non-zero probabilities of 0% or 100% damage. This probability surface encapsulates the aleatory uncertainty in the estimation of both the local intensity and damage.

Uncertainty: Fig 3
Figure 2. During the 1999 earthquake in Chi-Chi, Taiwan, one apartment building toppled to the ground, while an identical one nearby remained intact, illustrating the need to account for aleatory uncertainty (Source: AIR)

This brings us to a final point about uncertainty in damage estimation, and ultimately, in insured losses. As large loss U.S. hurricanes in recent years have demonstrated, the reliability of model output is only as good as the quality of the input exposure data. Uncertainties or inaccuracies in building characteristics or replacement values can propagate dramatically into the estimates of losses.

An Independent Study

For as long as catastrophe models have been around, state regulators have been concerned with whether the premiums charged by insurance companies based on model output are fair and reasonable. Based on the model used, these rates can vary quite significantly, especially for geographic locations where historical data may be relatively scarce. Modelers aim for a comprehensive treatment of uncertainty, but what exactly does that mean? During the past few years, two independent researchers—Charles Watson and Mark Johnson—have led an ongoing study sponsored by the Florida Commission on Hurricane Loss Projection Methodology to objectively assess and benchmark hurricane loss modeling results. Watson and Johnson created an ensemble of 972 models by combining nine published public domain submodels for wind, four submodels for surface friction, nine damage functions for wood-frame buildings, and three statistical approaches to event generation.

Because these submodels are supported by published literature and considered scientifically objective, this study was assumed to capture the full range of epistemic uncertainty surrounding hurricane loss estimation. The researchers calculated loss cost for each of Florida’s 67 counties using all 972 public models, and compared it to results from proprietary hurricane models used by insurers, including the AIR model (see Figure 3).

Uncertainty: Fig 4
Figure 3. Loss Cost for Florida Counties (Source: FCHLPM 2007)

Since this study includes a broad spectrum of credible models, the model output is highly variable and the range between the maximum and minimum losses in most counties is extremely wide. To rely on such distributions for ratemaking purposes would be difficult; after all, rates and prices are single values. While the study is undoubtedly a significant contribution to the literature on hurricane risk modeling, it can be argued that the range may overstate the uncertainty because not all of the 972 models perform well when validated against historical losses. For a detailed technical discussion on this research, the reader is referred to Watson and Johnson, 2004.2  

Clearly, catastrophe modelers cannot tackle the problem of uncertainty by providing endless combinations of subcomponents (for one, the computational requirements would make it unfeasible); a narrower view is needed. As shown in the graphical comparison above, AIR’s U.S. hurricane model (which is not one of the 972 models) is near the median for most counties. This brings into focus the value of a well developed catastrophe model, which is only possible through the collaboration of a multidisciplinary team of scientists, engineers, statisticians, actuaries, and software developers. In the interest of disclosure, it should be said catastrophe models benefit greatly from the abundance of claims data arising from Florida hurricanes; models for other regions and perils may be characterized by considerably greater uncertainty.

Users of catastrophe models should not assume that more choice is a necessarily suitable approach to addressing uncertainty. This is all the more important to recognize because risk is very sensitive to changes in the underlying estimates of hazard. Seemingly independent and well validated model components do not guarantee a robust model when pieced together. AIR’s development process does not merely entail a selection of subcomponents, but requires a deep and comprehensive understanding of the overall model and its ultimate goal: to produce realistic and reliable estimates of loss.


Every model update attempts to reduce uncertainty by incorporating cutting-edge research and the latest available observations and claims data. AIR’s methodology is transparent, rigorous, and scientifically defensible. Our researchers undertake a meticulous review of scientific literature and conduct their own research where appropriate. Uncertainty is carefully considered and incorporated in the model components, each of which is thoroughly calibrated and validated against actual data.

To be sure, there is still much to be improved upon. The future of catastrophe modeling lies in further advancing the state of scientific knowledge and in refining how uncertainty is addressed and reported. However, even with a perfect understanding of the physical world (which we are far from claiming), there will still be pure—and irreducible—randomness in nature. Uncertainty is an inherent part of catastrophe models. But lacking a firm sense of where it comes from and how it is addressed in the models, the most skeptical may dismiss the value of catastrophe modeling and consider it futile in the face of so much uncertainty. Without uncertainty, however, there would be no risk, and without risk, no insurance.

For model users to effectively mitigate losses and to identify business opportunities, it is important to be able to recognize and understand uncertainty—both inherent in the model and introduced by input exposure data—and to incorporate the most comprehensive and robust view of risk into their decision making processes.

1 Report to the Florida House of Representatives, Comparison of Hurricane Loss Projection Models, November 5, 2007

2 Watson, C. Jr, and M. Johnson. 2004. Hurricane Loss Estimation Models, Opportunities for Improving the State of the Art, Bulletin of the American Meteorological Society, 85, 1713-1726.



You’re almost done.
We need to confirm your email address.
To complete the registration process, please click the link in the email we just sent you.

Unable to subscribe at this moment. Please try again after some time. Contact us if the issue persists.

The email address  is already subscribed.