Addressing Challenges in Historical Severe Thunderstorm Data: Are You Biased?
Mar 26, 2014
Editor's Note: AIR is releasing a major update to the AIR Severe Thunderstorm Model for the United States in June 2014. This article provides an overview of a significant enhancement to the hazard component—a unique blend of statistical and physical modeling—that overcomes historical data limitations and reporting biases.
Severe thunderstorms occur relatively frequently in comparison to other natural catastrophes, such as earthquakes and hurricanes. However, despite plenty of recent data on supercell thunderstorms, squall lines, and their main sub-perils (i.e., tornadoes, hailstorms, and straight-line winds), the shortage and inherent biases of historical data are some of the biggest challenges to developing a model that provides full spatial coverage of simulated events throughout the continental United States. The 2014 model update overcomes these challenges and provides companies with the high-resolution, detailed view of the risk needed to assess the impact of the severe thunderstorm losses on their portfolios.
Biases in Storm Prediction Center Data
An accurate estimate of insured losses resulting from severe thunderstorms is inextricably linked to the observed frequency and intensity of the individual hailstorms, tornadoes, and straight-line winds that make up these events. The Storm Prediction Center (SPC) collects these data in event reports called in by local authorities, trained weather spotters, and the general public dating back to the 1950s. However, the data contain reporting biases.
The clearest indicator of reporting bias in the SPC data is the dramatic increase in severe thunderstorm reports over the last fifty years—despite the fact that conditions conducive for severe thunderstorm formation have remained relatively stable. This increase may be partially attributed to the National Weather Service's (NWS) efforts to recruit and train event spotters starting in the 1970s, an increase in event awareness and interest among the general public, the installation of Doppler radar systems at local and regional weather stations in the 1980s, and the increasing availability of wireless technology starting in the 1990s. The result is duplicate event reporting in some regions and underreporting in others.
Another source of bias is the non-uniform distribution of population. Early studies on severe thunderstorm activity provide evidence on just how severe an effect population density differences can have on the observed frequency of the sub-perils. For example, the very first tornado frequency maps produced in 1887 showed no tornadoes in the sparsely populated stretch between Oklahoma City, Oklahoma, and Amarillo, Texas1—an area located in what's known today as "Tornado Alley."
While the population bias in SPC data is not nearly as dramatic as it was in 1887, it is still evident. Figure 1 shows the normalized counts of tornado reports and tornado days for the U.S. from 1950-2005.2 A clear correlation exists between the number of reported tornadoes and the normalized population—an indicator that at least some of the increase (especially for weak tornadoes) is purely a function of increasing population density and urbanization.
Differences in the reporting procedures also introduce bias. For example, a 2005 examination of severe wind reports shows distinct state boundaries—a clear indicator of differences in reporting procedures among the various NWS offices3 (see Figure 2).
To build a model that provides a complete and accurate representation of severe thunderstorm potential, the reporting bias in the SPC data must be addressed. AIR scientists did this by combining SPC data with Climate Forecast Systems Reanalysis data from the National Center for Atmospheric Research (NCAR) for the period from 1979 to 2011—the period for which both data sets are available.
Climate Forecast Systems Reanalysis Data
Climate Forecast Systems Reanalysis (CFSR) data sets are the result of a project initiated by the National Centers for Environmental Prediction to provide atmospheric data at a higher spatial and temporal resolution for climate studies. The data sets represent a "best estimate" of the state of the atmosphere based on both observational data and numerical weather prediction (NWP). As a part of this project, observational data were taken from surface-based weather stations, ocean buoys, weather satellites, precipitation gauges, and weather balloons and then input into an NWP model—a computer model designed to simulate the atmosphere. The NWP model then estimates data values in locations where observations may not have been available, thereby providing full spatial coverage.
Reanalysis data are useful for evaluating severe thunderstorm risk throughout the United States because they provide key information about the ingredients for severe thunderstorm formation (i.e., moisture, instability, rotation, and lift) based on the atmospheric conditions at the time when past events occurred. AIR researchers used these data to calculate sub-peril—specific parameters, or indexes, that correlate well with activity.
For example, the significant tornado parameter (STP) best matches the seasonal evolution of tornado risk during the historical period. Similarly, the parameter that best matched the hail frequency distribution was the significant hail parameter (SHiP). Lastly, the energy helicity index (EHI) was found to best match the pattern of straight-line wind risk, despite not traditionally being used for that purpose. The parameter values were then used to modify and supplement the SPC reports as part of a statistical and physical process that can aptly be described as "smart-smoothing."
Using a traditional statistical simulation approach, historical reports—or "seed" events—are spatially perturbed equally in all directions to create a "probability surface" that allows for simulated events to occur in areas where they are possible but may not have been recorded.
The updated AIR Severe Thunderstorm Model for the United States takes this approach one step further by using the meteorological parameters derived from the CFSR data to "smart-smooth" the SPC reports in a more physically realistic way. Smart-smoothing (1) reduces smoothing into areas where parameter values are unfavorable for severe thunderstorm formation and (2) increases the likelihood of an event being perturbed into areas where the parameter values are extremely high. This second step is crucial to capturing the risk in rural areas where events are more likely to go unreported.
Figure 3 shows a case study of the process using a tornado outbreak—the seed event—in Oklahoma on May 10, 2010. Panel D shows the increase in probability in central and west Texas that results from smart-smoothing (in contrast to panel B, which shows the results of traditional smoothing). This increase in probability is consistent with tornado watches for May 10, 2010, that extended well into Texas (see Figure 4).
Adaptive Cluster Sampling
The astute reader may wonder how the subtle adjustment to the smoothed probability surface shown in Figure 3 makes much difference in the final result. In fact, without one more algorithm, borrowed from the field of botany, the results would fall short. When botanists are looking for an extremely rare plant in a large area, the real struggle is to find the first one. Once they have found the first, more specimens are likely to be located in the immediately adjacent areas.
A similar concept can be applied to severe thunderstorm simulation. Once convection initiates and spawns a tornado in one location, more tornadoes are likely to occur in the vicinity. By applying a technique called "adaptive cluster sampling," the AIR model increases the probability that individual tornadoes, hailstorms, and straight-line winds will occur in the vicinity of the first one drawn.
Figure 5 shows four plausible model perturbations of the May 10, 2010, seed event. The bottom right panel shows one cluster around the area of lower probability, which would not be possible without both smart-smoothing and adaptive cluster sampling.
The combination of smart-smoothing and adaptive cluster sampling has several benefits over a purely statistical approach. For one, it provides better realism in areas where geography impacts severe thunderstorm activity. For example, while storms do happen in northern Michigan, there are historically lower levels of severe thunderstorm activity in this region due to its location on the leeward side of Lake Michigan. However, a purely statistical methodology will equally perturb historical events southward into the Detroit area and northward into northern Michigan. By incorporating meteorological information, the SPC reports are smoothed in a physically realistic way—that is, the perturbed simulated event is less likely to have happened toward the north than the south.
The use of CFSR data allows the model to simulate events that have not happened historically (at least in the observed record), but are plausible from a meteorological perspective. A case in point is the large tornado outbreak of November 2013, which impacted Illinois and spawned an EF-4 tornado. A tornado of this strength had never been recorded in that location that late in the season. Even though this event occurred after the model's historical data set was built, the updated model's 10,000-year catalog includes 54 such tornadoes within 1 degree of the 2013 event. The model is also able to simulate events similar to the 1974 Super Outbreak, during which more than 60 EF-3 or greater tornadoes struck.
Another advantage of this methodology is that it produces more robust results along the U.S.-Canada border. Given that Canada has no SPC reports, most published climatology literature for the U.S. interpolates storm frequencies down to zero at the Canadian border. The use of CFSR data ensures that the modeled risk along the entire northern border of the U.S. is as realistic and accurate as possible.4
With every model update, AIR provides a better view of the risk by advancing the science used in model development. The combination of statistical and physical methods implemented in the 2014 update to AIR Severe Thunderstorm Model for the United States will help companies better manage this perennial source of significant insured loss.
1 Brooks, H.E., Doswell III, C.A., and Kay, M.P. (2003). "Climatological estimates of local daily tornado probability for the United States." Weather and Forecasting, 18(4), 626-640.
2 Diffenbaugh, N.S. et al. (2008). "Does Global Warming Influence Tornado Activity?" EOS, Transactions. 89(53), 553-554.
3 Doswell III, C.A., Brooks, H.E., and Kay, M.P. (2005). "Climatological estimates of daily local nontornadic severe thunderstorm probability for the United States." Weather and Forecasting, 20(4), 577-595.
4 Note that catalogs in the AIR Severe Thunderstorm Model for the United States will be shared with the updated Canada Severe Thunderstorm Model (due to be released in 2015)—allowing for a consistent view of the risk between the two countries.