The original article by Zachary E. Ross et al. titled “3D fault architecture controls the dynamism of earthquake swarms” was published in the June 24, 2020, edition of Science. The majority of the article discusses the spatial and temporal evolution of the Cahuilla swarm, using a deep learning neural networks detection algorithm developed by Z. Ross at Caltech to render the times and locations of 22,000 earthquakes within the swarm-area with the magnitude range of 0.7 to 4.0. Although the geophysical and geological evolutions of the swarm are very interesting, what is of most interest for my purposes here is how AI technology is used in that study, and whether it is applicable to other areas of earthquake seismology to improve earthquake risk analysis. What follows is a brief discussion of Ross’ algorithm and the possible applications of AI technology for earthquake risk analysis.
Locating earthquakes is one of the fundamental tasks in earthquake seismology, and there are many different algorithms available for this purpose. During the last few decades, with an increase in the number and sensitivity of seismic recording stations around the world, most networks can now detect very small pulses above the background noise. Determining if such low intensity pulses are earthquakes requires sophisticated algorithms. Typically, a region is divided into grids and the algorithm constantly scans the grids to identify earthquakes. Recent algorithms use Bayesian models to deal with uncertainties and consider new information to improve the network capabilities.
Deep Learning (DL) Approach to Seismology
In 2018 Ross and his colleagues at Caltech published an article on a deep-learning (DL) approach for locating earthquakes. This framework is constructed based on recent advances in DL for grid-free earthquake phase association. The word “phase” in this context refers to S or P waves associated with an earthquake; the arrival times of these waves are used to determine an earthquake’s location. The DL system is formulated as a learning process to link phases that share a common origin. The approach is built upon Recurrent Neural Networks (RNNs), which are designed to learn temporal and contextual relationships in sequential data. The following is Ross’ description of RNN:
“RNNs allow for information to be passed between successive elements through the use of an internal memory state. This state is dynamically modulated by gates that control what information is retained along the way, and the parameters governing the gates themselves are learned through the training process. The outputs of RNNs are very flexible, and could be a single valued output given an input sequence, or a sequence of outputs. To date, RNNs have been applied to a variety of settings, including language translation , speech synthesis , speech recognition , image captioning , and many others. The most commonly employed variant of the RNN is the long short-term memory (LSTM)  network. These networks have three gates that control the flow of information, and are useful because they are not so susceptible to training issues related to diminishing propagation of information over large sequences. In recent years, another variant called the gated recurrent unit (GRU)  has become popular because it has only two gates instead of three, resulting in fewer parameters and faster training. These types of RNNs are considered state of the art for many problems including speech recognition and language translation.”
DL Algorithms and What They Can Reveal about Earthquakes
Fundamentally, DL algorithms are designed to automatically learn from raw data to perform certain defined objectives. Ross et al. used synthetic data to train the DL algorithm. They used 811 past and present stations in Southern California and 88 stations of Hi-net Japan seismic network and 1D velocity profile to simulate synthetic earthquakes with arrival times of P and S waves at stations that include some uncertainties. They used 75% of the 12 million sequences for training and 25% for validation purposes.
A DL algorithm is excellent in finding/resolving a nonlinear pattern in data; however, similar to all AI algorithms, it requires a large amount of data for the training purposes. Without data the system cannot learn about the types and scales of uncertainties to develop constructive responses for pattern identification or forecasting. In certain cases, the training data can be constructed through simulation, as is the case in the Ross et al. study, if the mechanism and the parameters that define the system are well understood. For example, for locating earthquakes, the detailed information on the regional velocity depth profile and the seismic wave propagation from source to stations can be used to formulate P and S waves’ arrival times at stations considering various levels of uncertainties.
A DL algorithm can also be applied in detecting aftershock patterns of earthquakes for both future and past occurrences. The latter application can play an important role in formulating the long-term rate of regional seismicity by removing the transient seismicity from the regional earthquake catalog.
Using InSAR data, a DL algorithm can be applied in identifying earthquake damage. In recent years, coherence-analysis information from InSAR images has been used to identify possible damage areas from earthquakes; however, there are large variabilities that make the identification rather uncertain. DL algorithms with some creative formulation might be useful to create systematic and robust approaches for mapping observed InSAR coherence changes to areas damaged by earthquakes, and also possibly to determine the corresponding scale of the damage. Another application of DL algorithms might be in processing earthquakes claims data for characterizing building damage ratios. Claims data by nature include very large variabilities due to various types of uncertainties. DL algorithms may help identify a nonlinear pattern in such data to improve the formulation of various aspects of building damage functions. DL may also be useful in finding patterns in claims data regarding a portfolio’s behavior in response to earthquakes.
Is Large Magnitude Earthquake Forecasting Possible?
In areas where data is scarce, and the causative mechanism is not well understood, it is hard to formulate DL and other AI algorithms to detect reliable patterns or make reliable forecasts. One such area is forecasting the occurrence of large magnitude earthquakes on faults and subduction zones. Seismologists understand the mechanism of the strain accumulation and release within seismically active regions and subductions zones. It is rather difficult, if not impossible, however, to translate this understanding to a reliable short-term forecast of occurrences of large magnitude earthquakes. In other words, the algorithm needs to be intelligent enough to determine when and how a small rupture initiated at a location on a fault/subduction zone can grow to become a large or gigantic earthquake. The complex dynamics that control the rupture growth are situated tens of kilometers below the surface and are not accessible to seismologists. The only way to learn about the complexities is through inferences, and there is not enough data from such past earthquakes to train DL algorithms. Furthermore, there are large epistemic uncertainties on the details of the state of faults/subduction zones at these depths that make it unrealistic to simulate synthetic data for DL training purposes.
In recent years, variations of AI algorithms have been used to find correlation patterns between the occurrences of earthquakes and geophysical, geological, and other relevant parameters with the idea that they can be utilized as forecasting tools. The AI training is mostly done using small magnitude earthquakes because of their abundance. The results of such studies have often been extrapolated to extend the forecasting capabilities to large magnitude earthquakes using a combination of deterministic and probabilistic techniques. These pattern recognition models are interesting, with some forecasting capabilities for small to moderate size earthquakes, but they cannot provide reliable and consistent forecasting results for large magnitude earthquakes because of the lack of data to properly train the algorithm.
Another area of DL application is in earthquake alarm systems where the detection of the less destructive P wave of a large magnitude earthquake can trigger an alarm for the arrival of more destructive S and surface waves. There are reports of work conducted in this area from the Institute for Earth Science and Disaster Resilience (NEID) in Japan and Earthquake Alarm Systems (ElarmS) in California. The idea is to accurately estimate the earthquake source information, such as the magnitude and location of the epicenter, a few seconds after the first arrivals of the P waves at the recording stations. The three seconds data of P wave information from at least four stations are used to forecast the source parameters. The AI algorithm is used to distinguish large magnitude earthquakes from the small ones to trigger an alarm.
What Does the Future Hold?
In recent years, there has been some effort to develop an Internet of Things platform (IoT) for seismology. The IoT is defined as an Internet-based platform that enables advanced information sharing through interoperable communication technologies. It is expected that 50–100 billion devices will be connected to the internet via IoT by the end of 2020. In seismology, IoT refers to platforms where monitoring devices are linked to local computers that can communicate to local monitoring networks. It would be possible to develop a smart and connected global seismic monitoring network where real-time seismic data can be collected locally. DL-enhanced monitoring devices could then identify the useful and relevant data to be transmitted for processing and seismic analysis. One can imagine the many uses for such a platform not only for furthering our understanding of earthquakes but also for earthquake preparation and resilience.