Earthquake Reaearch in China  2019, Vol. 33 Issue (2): 288-304     DOI: 10.19743/j.cnki.0891-4176.201902014
Active Source Seismic Identification and Automatic Picking of the P-wave First Arrival Using a Convolutional Neural Network
XU Zhen1, WANG Tao1, XU Shanhui2, WANG Baoshan2,3, FENG Xuping1, SHI Jing1, YANG Minghan1     
1. Institute of Earth Exploration and Sensing(IEES), School of Earth Sciences and Engineering, Nanjing University, Nanjing 210046, China;
2. Institute of Geophysics, China Earthquake Administration, Beijing 100081, China;
3. School of Earth and Space Sciences, University of Science and Technology of China, Hefei 230026, China
Abstract: In seismic data processing, picking of the P-wave first arrivals takes up plenty of time and labor, and its accuracy plays a key role in imaging seismic structures. Based on the convolution neural network (CNN), we propose a new method to pick up the P-wave first arrivals automatically. Emitted from MINI28 vibroseis in the Jingdezhen seismic experiment, the vertical component of seismic waveforms recorded by EPS 32-bit portable seismometers are used for manually picking up the first arrivals (a total of 7242). Based on these arrivals, we establish the training and testing sets, including 25, 290 event samples and 710, 616 noise samples (length of each sample:2s). After 3, 000 steps of training, we obtain a convergent CNN model, which can automatically classify seismic events and noise samples with high accuracy (> 99%). With the trained CNN model, we scan continuous seismic records and take the maximum output (probability of a seismic event) as the P-wave first arrival time. Compared with STA/LTA (short time average/long time average), our method shows higher precision and stronger anti-noise ability, especially with the low SNR seismic data. This CNN method is of great significance for promoting the intellectualization of seismic data processing, improving the resolution of seismic imaging, and promoting the joint inversion of active and passive sources.
Key words: CNN     Active source seismic identification     First arrival picking     Anti-noise ability    

INTRODUCTION

In seismology, active seismic sources (blasting, vibroseis vehicles and airguns, etc.) are widely used for local-scale near-surface fine structure exploration and time-varying analysis of wave velocity (e.g., Wang Weitao et al., 2017). In terms of energy release, active sources can be classified as microquakes. In recent years, microseismic research has become a hot topic in seismology, which is of great significance in the determination of subsurface fine structures, aftershock detection (Peng Zhigang et al., 2009), earthquake early warnings (Kato A. et al., 2012) and microseismic monitoring of hydraulic fracturing (House L., 1987; Wang Peng et al., 2014; Bao Xuewei et al., 2016). The low energy and high frequency characteristics of microquakes make it difficult to effectively identify from the noise, and the exponential growth of seismic data requires improvement of the efficiency of seismic phase pickup, therefore, efficient and accurate seismic signal identification and seismic phase pickup are important foundations of microseismic research.

Since the 1970s, seismologists have proposed and developed a series of methods for automatic seismic phase identification and pickup. In essence, they extract the features of the original signals, and identify earthquakes and pick up arrival times based on the difference between the background noise and seismic signals. Commonly used methods are as follows: firstly, the ratio between short and long time windows (STA/LTA method, Stevenson P. R., 1976; Allen R. V., 1978; Baer M. et al., 1987; Liu Han et al., 2014). Based on amplitude characteristics, this method is widely used because of its simplicity and efficiency in calculation, by which, the ratio of the average value of wave packets in short and long time windows (or other characteristic functions) is calculated and the threshold value is set for pickup. However, parameters such as the lengths of short and long time windows and the threshold value need to be set manually. Moreover, this method is not competent when the signal-to-noise ratio is low; Secondly, the Akaike information criterion method (AIC method, Maeda N., 1985; Sleeman R. et al., 1999; Wang Haijun et al., 2003; Liu Xiqiang et al., 2009; Zhao Dapeng et al., 2012). This method takes the minimum value of the AIC function of seismic signals as the cut-off point between background noise and seismic signals to pick seismic phase, and the disadvantage is that the AIC function has the minimum value with or without earthquakes; Thirdly, methods based on features of frequency domain. Such methods use fast Fourier transform (Shensa M. J., 1977), S transform(Stockwell R.G. et al., 1996) or wavelet transform (Liu Xiqiang et al., 2000) to calculate frequency spectrum or phase information to identify earthquakes and pick up seismic phase, and the disadvantage is low time resolution; Fourthly, the correlation-based approach (Gibbons S. J. et al., 2006; Shelly D. R. et al., 2007; Brown J. R. et al., 2008; Zhang Miao et al., 2015; Hou Jinxin et al., 2017; Li Lu et al., 2017). This method mainly uses the similarity of waveforms of the earthquake and can accurately identify microquakes in seismic records with low signal-to-noise ratio, but the template earthquake needs to be set in advance and it's easy to miss earthquakes with dissimilar waveforms. In addition, scholars also put forward ways which integrate the above methods (Tan Yuyang et al., 2016; He Xianlong et al., 2016; Jiang Ce et al., 2018). In theory, they can make up for the shortcomings of each method, but due to the necessity to comprehensively consider the working conditions and accuracy judgment of all methods involved, the algorithm is complex and requires human participation, which makes it difficult to fully automate execution.

The convolutional neural network (CNN) is one of the frameworks most widely used in deep learning, which has made great breakthroughs in image recognition, speech recognition, natural language processing, object detection and semantic segmentation(e.g., Kim Y., 2014; Rawat W. et al., 2017) by feature extraction of the input data through the convolution layer (LeCun Y. et al., 1998), and seismologists have also begun to apply this method of extracting features from data to seismic signal identification and automatic pickup of seismic phases. Perol T. et al., (2018) proposed ConvNetQuake by using a large number of three-component seismic data, and the convolutional neural network obtained by training can recognize and locate microquakes effectively. The PhaseNet proposed by Zhu Weiqiang et al.(2019) is a seismic phase pickup method based on CNN regression. AU-net is trained by using the input seismic waveform records and its pre-set output waveform, and the trained U-net can directly pick up travel times of P and S seismic phases. Zhao Ming et al. (2018) constructed a large number of high-quality training and testing sample sets by using the arrivals of P and S waves of more than 10, 000 aftershocks of the Wenchuan earthquake manually selected by experts, and trained the convolutional neural network to be applied in aftershock detection and seismic detection of China Earthquake Networks Center. Besides, other deep learning networks are also gradually introduced into the field of seismology. For example, DeVries P. M. R. et al., (2018) collected a large number of mainshock-aftershock pairs and used this data to train a deep artificial neural network. The trained network can predict the probability of aftershock location to some extent. Yu Ziye et al. (2018) applied the inception network structure used in face recognition to the pick-up of first arrivals of P-wave and S-wave of natural earthquakes. The convolutional neural network and deep learning technology have been proved to be helpful to seismological research, but there is still a lack of specific research on the recognition and pickup of active source signals.

In the experiment in Jingdezhen, Jiangxi Province at the end of 2017, the active-source seismic signals excited by vibroseis vehicles were received by the two-dimensional array commonly used for passive seismic study (covering 100km2, the blue triangles in Fig. 1(a)). In this study, the continuous waveforms recorded by the array are used to train CNN to identify active-source seismic signals and pick up P-wave first arrivals. Compared with the CNN used for natural earthquake identification and seismic pickup in the past, since the time and spatial location of active seismic source are known, we can more precisely evaluate the accuracy of microseismic identification and seismic phase pickup of CNN at a low SNR. In this study, the arrival times of a large number of P waves excited by active source are picked up manually, and a large number of samples are acquired by intercepting time windows of earthquakes and noise. Then CNN is trained to accurately identify different samples, and the trained CNN is used to scan continuous waveform and applied to the pickup of P-wave first arrivals. The results show that the trained CNN is with a high degree of accuracy and precision in the identification of active source seismic signals and automatic pick up of P-wave first arrivals, and can deal with data with low SNRs.

Fig. 1 Distribution of seismic stations and epicenters and waveform recording (a)Epicenters, stations and ray paths of seismic records manually picked up.
(b)Seismic waveforms recorded by different stations from the same earthquake and manual picking points
1 DATA AND METHODS 1.1 Data and Preprocessing

Integrated three-dimensional geophysical exploration was carried out in the Zhuxi mining area of Jingdezhen, Jiangxi Province at the end of 2017. 192 temporary seismic stations (blue triangles in Fig. 1(a)) were set up, including 108 stations with a spacing of 1km, 69 stations with a spacing of 2km and 15 stations with a spacing of 5km. The data in this study is from the continuous records of vertical components (sampling interval: 0.005s) recorded by short-period seismograph (model: EPS portable seismographs), and the active-source signals were excited by MINI-28 vibroseis developed by the Bureau of Geophysical Prospecting INC., China National Petroleum Corporation. The vibroseis worked 758 times during the period (red pentagrams in Fig. 1(a)), and a total of 7, 242 seismic records with P-wave first arrivals were picked up manually. Their ray paths are shown by the gray lines in Fig. 1(a). 85.3% of the epicenter distances of these records are within 4km.

In addition to routine de-averaging, linear detrending and the removal of instrument response, the data preprocessing also includes: 1) Cross-correlation. Instead of pulse signals, the vibroseis produces continuous vibration signals, and we need to let CNN act on signals after cross-correlation, therefore, cross-correlation calculation is done between the vibroseis scanned signals and the received signals to remove the influence of source time function, so as to improve the signal-to-noise ratio of first-arrival waves; 2) Manual picking and normalization. 7242 pieces of P-wave first-arrival times are picked up manually and marked as T1, and waveform records of 20s (1s before the time of the quake and 19s after the quake) are cutted and the amplitudes are normalized respectively. This is because the recorded amplitudes of the original waveforms vary greatly with the distances from the epicenter, so it will be difficult for CNN to extract key features of active-source seismic signals if they are directly used as training samples; 3) Acquisition of earthquake samples. According to formula (1), the SNR is calculated and the SNR threshold (SNR > 5db) is set to screen out 4, 389 seismic records, and different levels of Gaussian noise (mean value is 0 and standard deviation ranges from 0.02 to 0.1) is added to these records to obtain 25, 290 seismic records. Data cutted 1 second before and after T1 of these seismic records are taken as earthquake samples; 4) Acquisition of noise samples. A total of 710616 noise samples are obtained by cutting time windows 10-12s and 12-14s after the T1 of the above seismic records and 10-12s and 12-14s after the theoretical arrival times of seismic records of P-wave first motions (calculated according to AK135 velocity model) which are unable to be picked up manually. Then two types of samples (see samples in Fig. 2) are labeled as the corresponding expected CNN output, earthquake samples are labeled as [0, 1], and noise samples are labeled as [1, 0]. Finally, all earthquake and noise samples are divided into training and testing sets. The training set incorporates 22, 548 earthquake samples and 617, 926 noise samples, which are used to adjust parameters and hyper-parameters of the neural network, and the test set incorporates 2, 742 earthquake samples and 92, 690 noise samples, which are used to test the accuracy of neural network identification.

Fig. 2 Training samples (a), (b) are respectively earthquake samples and earthquake samples added with Gaussian noise with σ=0.1. Red crosses are arrival times of P-wave first motions picked up manually; (c), (d) are respectively noise samples and noise samples added with Gaussian noise with σ=0.1
$ {\rm{SNR}} = 10 \cdot \log \left( {\frac{{\sum\nolimits_{{T_1}}^{{T_{1 + 1s}}} {{X^2}} }}{{\sum\nolimits_{{T_{1 + 10s}}}^{{T_{1 + 11s}}} {{X^2}} }}} \right) $ (1)
1.2 Establishment and Training of Convolutional Neural Network

CNN is a kind of deep neural network with convolutional structure, which can reduce the number of parameters of the network and can alleviate the over-fitting problem of the model. Classic CNN consists of the input layer, convolutional layer, down sampling layer (also known as the pooling layer), fully connected layer and output layer. The structure of the CNN can be modified according to specific situations. The CNN is in essence a kind of mapping from the input to the output, and can automatically extract the characteristics of the input data and learn a large number of mapping relationships between the input and the output, without the need of a precise mathematical expression between any input and output data. As long as it is trained with training samples, CNN can learn the mapping relationship between the input and the output. CNN can perform supervised training directly (Bengio Y., 2009) with the back-propagation algorithm (BP algorithm). The back propagation is relative to the forward propagation. In the training of the neural network, we calculate the predicted value of the current model using forward propagation and calculate the difference between the predicted value and the theoretical value to get the loss function. According to the final cost function, the partial derivative of each parameter is obtained by using the BP algorithm, and then the parameters are adjusted and updated according to the gradient descent algorithm to realize back propagation, so as to make our model more accurate in prediction.

In this study, CNN is trained with earthquake and noise samples in training sets, so that it can extract the characteristics of seismic data and thus has the ability to distinguish earthquake time windows from noise time windows. 10% of the training data is randomly selected as the validation set to adjust hyper-parameters of the network, and the rest is used as the training set to adjust the network parameters. Through repeated experiments and comparing the classification accuracy of different models, we determine the basic structure of CNN (Fig. 3), which mainly incorporates an input layer Z0, 7 convolutional layers (Z1-Z7), 1 flattened layer (Z7) and 1 fully connected output layer. Since the input data is the seismic records with a sampling rate of 200Hz and a time window length of 2s, the input layer is a vector with a length of 400 sampling points. Each convolutional layer contains 20 convolution kernels, each of which represents a channel to extract one of the characteristics of the data of the previous layer. The length of the convolution kernel is 8, the moving step is 2, so with each convolution, the characteristic length is halved (When the characteristic lengths of the upper layer are odd numbers, the system will automatically add zero at the edge of the characteristics to make them divisible by 2). The values of neural nodes of each layer are obtained from the previous layer through convolution, summation, adding bias terms and adding activation function, and the calculation formula is as below.

Fig. 3 CNN structure Time windows of vertical seismic data of 400 sampling points are input, each convolutional layer incorporates 20 convolution kernels, and the convolution kernels move by 2 sampling points each step. Through 7 convolutional layers, the network extracts 20 sets of characteristic vectors with a length of 4, then the 20 sets of characteristic vectors are flattened and passed through a fully connected layer to output predicting results
$ Z_{c,t}^i = \sigma \left( {\sum\limits_{c' = 1}^{{c_{i - 1}}} {\sum\limits_{t' = 1}^8 {Z_{c',st + t' - 2}^{i - 1}} } \cdot W_{ct'}^i + b_c^i} \right),i \in \{ 1, \cdots ,7\} $ (2)

where, σ(·)=max (0, ·), namely, RELU activation function. The input and output channels are c′ and c respectively, t′ and t represent data of the t′th and tth sampling points of the features of the previous layer and current layer, i is the number of layers, Ci denotes the number of channels of the ith layer (the number of channels at the input layer is 1, and the number of channels at other convolutional layers is 20), s is the moving step length of convolution kernel, Z is the node value of the network, W denotes the weight, and b represents the bias term. Through 7 convolutional layers, the input data (1, 400) changes to (20, 4), which means that the network extracts 20 sets of features of the input data, and each set of features has a length of 4.

The 20 sets of features are flattened into a characteristic vector (1, 80), and after passing through a fully connected layer, a vector [Z0, Z1] with a length of 2 is obtained (formula(3), Bengio Y., 2009). Two elements of the vector, Z0 and Z1, are respectively noise and earthquake score values. Finally, Softmax function is used to normalize the score value into a probability vector [p0, p1] (formula (4), Bengio Y., 2009). c in the formula takes the value of 0 or 1, indicating respectively noise or earthquake, pC represents the probability of two kinds, and p0+p1=1. When p0 is greater than 0.5, CNN judges the input sample as noise, and when p1 is greater than 0.5, CNN judges the input sample as earthquake.

$ {Z_c} = \sum\nolimits_{c' = 1}^{80} {Z_{c'}^{\bar 7}} \cdot w_{cc'}^8 + b_c^8,c \in \{ 0,1\} $ (3)
$ {p_C} = \frac{{\exp \left( {{Z_C}} \right)}}{{\sum\nolimits_{k = 0}^1 {\exp } \left( {{Z_k}} \right)}} $ (4)

We use the cross-entropy loss function (formula (5), Gu Jinxiang et al., 2015) to optimize the entire network, where N is the number of samples. The cross-entropy loss function can show the difference between the probability distribution predicted by the model pk and ideal probability distribution qk. For each sample, the ideal probability distribution can be expressed as formula (6) (Gu Jinxiang et al., 2015), and C in the formula takes the value of 0 or 1, indicating respectively noise or earthquake. In order to avoid overfitting caused by too large network parameters, we add second-order norm penalty term of the weight matrix to the loss function to regularize the network, and the regularization factor λ=0.001.

$ L = \frac{1}{N}\sum\nolimits_{k = 1}^N {\sum\nolimits_{c = 0}^1 {q_c^k} } \log \left( {p_c^k} \right) + \lambda \sum\nolimits_{i = 1}^8 {||{W^i}||_2^2} $ (5)
$ q_c^k = \left\{ {\begin{array}{*{20}{c}} {1,{\mathop{\rm class}\nolimits} (k) = c}&{}\\ {0,\;{\mathop{\rm class}\nolimits} (k) \ne c}&{} \end{array}} \right. $ (6)

Because of the large number of network parameters and training data, in order to improve the calculation efficiency, we employ the Adam (Adaptive Moment Estimation) algorithm (Kingma D. P. et al., 2014) to minimize the loss function L. The Adam algorithm is an improved stochastic gradient descent algorithm, which can update neural network parameters iteratively based on training data. In formula (5), N is 512, which means 512 samples are randomly filled into the network for each step of training. The back-propagation algorithm is used to calculate the gradient of L function with respect to network parameters, and then update the network parameters are adjusted according to the gradient direction. The training process is repeated to make the L function increasingly smaller. The learning rate during training is set at 0.0001. If the learning rate is too high, the model is difficult to converge, and if the learning rate is too small, the convergence speed is too slow. As the training continues, the loss function converges rapidly from 0.87 and then gradually slows down, and after 20, 000 steps of training, L already converges to 0.038. At this point, it can be considered that the gap between the ideal output and the actual output is small enough, indicating that the network already has good fitting ability (Fig. 4). The hardware used in the training is GTX 1080 Ti GPU produced by NVIDIA Corporation, and the training takes about 50min for 20, 000 steps. When there are too many iterations, the accuracy of the neural network on the training set will still increase, but its generalization ability will decrease, and over-fitting occurs, meaning that its performance on the test will decline. It is necessary to constantly monitor the accuracy of the neural network for the validation set or test set. When the performance of the model for the validation set or test set does not improve, the training is terminated in advance to avoid overfitting. After repeated tests and analysis, it is found that the model with 3, 000 steps of training has the best performance for the test set, therefore, when applying the training results, we select the convolutional neural network model obtained after 3, 000 steps of training.

Fig. 4 Descent curve of the loss function The abscissa represents the number of training steps, and the ordinate represents the loss function, indicating the difference between the ideal output and the actual output
2 RESULTS AND DISCUSSION 2.1 Signal Detection

After the convolutional neural network is trained to reach the convergence state, we use the test set to detect the recognition accuracy on active-source seismic signals or noise signals. The recognition results output by classifier on the test set can be divided into four cases: TP—the number of earthquake samples recognized as earthquakes, FN—the number of earthquake samples recognized as noise, FP—the number of noise samples recognized as earthquakes, TN—the number of noise samples recognized as noise, in which P and N represent the number of positive and negative samples respectively. Generally, there are three indicators for the evaluation of the recognition ability of the neural network model:

$ \begin{array}{*{20}{c}} {{\rm{ Accuracy }} = \frac{{TP + TN}}{{P + N}}}\\ {{\rm{ Precision }} = \frac{{TP}}{{TP + FP}}}\\ {{\rm{ Recall }} = \frac{{TP}}{{TP + FN}}} \end{array} $ (7)

where the accuracy rate indicates the ratio between the number of correctly recognized samples in the test set and the total number of samples in the test set, whether positive or negative; The precision rate is the ratio of the number of positive samples correctly predicted and the number of positive samples predicted by the model; The recall rate indicates the ratio between the number of positive samples correctly predicted and the total number of positive samples in the test set. The higher the values of these three quantities are, the stronger the recognition ability of CNN is.

Test samples are inputted into the trained CNN, and a probability vector with a length of 2 is output, [p0, p1], (formula (4)), where, p0 and p1 denote the probability that the sample should be tested for noise and earthquake respectively. When p0 is greater than 0.5, CNN judges the input sample as noise, and when p1 is greater than 0.5, CNN judges the input sample as earthquake. We have calculated all the indicators of CNN in the test set, and the test results show that the accuracy, precision and recall rates of CNN are all above 98% after 1000 steps of training (Table 1), indicating that the CNN has been able to accurately distinguish the two kinds of signals by extracting the characteristics of active source earthquakes and noise signals. However, the accuracy rate of the CNN in identifying natural microquakes is only 55%—73%, the recall rate 62%—77% (Zhao Ming et al., 2018), which shows that the CNN can achieve higher precision in identifying active source earthquakes. This may be related to the more complex characteristics of the natural seismic records, because natural seismic records are affected by focal mechanism, epicentral distance, magnitude and other factors, which are uncontrollable and variable, so it is difficult to ensure the completeness of the training samples when constructing it. In comparison with natural earthquakes, active source earthquakes have the advantages of high controllability in excitation times, location and source characteristics, high repeatability and intensive observation. All things considered, active source earthquakes can realize high-precision wave velocity structure exploration of subsurface media, which effectively makes up for the low resolution of possive source imaging at shallow depths. CNN can achieve good performance in the automatic recognition of active source earthquakes and has a wide range of potential applications.

Table 1 Classification results of convolutional neural network
2.2 Picking of P-wave first arrivals

After the CNN has been trained, in order to apply it to the picking of first arrivals, we scan continuous seismic records using the trained CNN. Starting from the initial moment, CNN scans point by point with a time window of 2s. For example, the first time window is to detect seismic records from 0s to 2s and the second time window is to detect seismic records from 0.005s to 2.005s. For each time window, the model will output the probability that the time window is a seismic sample or the middle of the time window is the first arrival time of the earthquake (Fig. 5). Because when constructing earthquake samples, the first-arrival time is at the middle of the sampling time window. In order to test the effectiveness of this strategy, four continuous seismic records from the same seismic station and different seismic sources (not participating in the training) are connected together as seismic records to be detected to test the effectiveness of earthquake picking of the model. As shown in Fig. 6, CNN scans seismic records to be detected (Fig. 6(a)) and gives the detection value of probability of earthquakes at the corresponding moments (Fig. 6(b)). On the whole, CNN can pick up the first arrival time of P-wave well in continuous seismic records.

Fig. 5 CNN scanning diagram The green rectangles are time windows of 2s, the trained convolutional neural network scans the continuous waveform and gives the output probability vector

Fig. 6 Example of CNN picking (a) Seismic records to be tested; (b) the probability of earthquakes at different moments predicted by CNN based on waveform records

In order to further test the anti-noise ability of this auto-picking strategy, we add noise to the seismic waveform records, and scan the seismic records added with noise still by CNN to obtain the probability detection curves, which are compared with STA/LTA curves. The results show that the trained CNN model can still pick up seismic signals accurately, and its picking ability is better than the STA/LTA method. Both methods can pick up seismic signals when the SNR of records is high. But when the noise increases, the value of the STA/LTA function will decrease. At this point, it may not be able to pick up or the threshold value needs to be manually adjusted for picking. However, the CNN method we constructed can still provide a prediction value close to 1 for seismic signals with poor signal-to-noise ratio (Fig. 7). Thus, it can be seen that the CNN is more tolerant of noise than the STA/LTA method.

Fig. 7 Comparison of seismic signal picking results (a), (b) and(c) are seismic records with no added noise, added Guassian noise with σ=0.1 and added Guassian noise with σ=0.2; (d), (e) and(f) curves of the predicted values (earthquake probability) of the convolutional neural network based on (a), (b) and(c); (g), (h) and(i) STA/LTA curves

In order to determine the precision of first arrival picking of the CNN, we calculate the root-mean-square error between CNN, STA/LTA automatic picking of arrival times and manual picking (formula (8)), and t(i) and $\hat{t}(i)$ are respectively arrival times automatically picked up and manually marked. The moment when the probability is the maximum, obtained by scanning seismic records, is taken as the first arrival time automatically picked up by the CNN. We test its precision in first-arrival picking for simulated seismic records. When Gaussian noise is not added and added with standard deviations of 0.1, 0.2 and 0.3 respectively, for seismic records shown in Fig. 8, the root-mean-square errors of CNN picking are respectively 0.02s, 0.03s, 0.06s and 0.08s, while the root-mean-square errors of STA/LTA method are 0.10s, 0.31s, 0.89s and 1.21s (Fig. 9). It can be seen that the picking precision of the CNN method is better than the traditional STA/LTA method, especially when the SNR is low.

Fig. 8 Simulated seismic records and auto-picking points using the CNN method (a)is simulated records obtained by moving the observed seismogram with assuming moveout (2km/s), (b), (c) and (d) are seismic records added with Gaussian noise with standard deviation of 0.1, 0.2 and 0.3

Fig. 9 Simulated seismic records and auto-picking points using the STA/LTA method (a)is simulated records obtained by moving the observed seismogram with assuming moveout(2km/s), (b), (c) and (d) are seismic records added with Gaussian noise with standard deviations of 0.1, 0.2 and 0.3
$ {\mathop{\rm RMSE}\nolimits} = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^N {(t(} i) - \hat t(i){)^2}} $ (8)

In order to test the picking precision of the CNN on real seismic records, we use this model to pick up 400 real seismic records (not participating in the training), and calculate the error distribution of the picking of the P-wave first arrivals of the real seismic records (Fig. 10). The results show that the average error is 0.015±0.098s, and 71.75% of the picking error is within 0.1s. For the Zhuxi mining area, the velocity measurement results of rock core samples show that the velocity in deep Carboniferous skarn (vP:6.80km/s) and Neoproterozoic phyllitic siltstone (vP:5.85km/s) is higher, while velocity in shallow bioclastic limestone (vP:3.57km/s), surrounding marble (vP:2.73km/s) and surrounding dolomite (vP:4.61km/s) varies significantly. Assuming that the average velocities of the high-velocity and low-velocity rock mass are 6.0km/s and 3.5km/s respectively, the travel time difference of 0.1s requires the seismic wave to travel 0.1/(1/3.5-1/6.0)=0.84km in medium of different velocities, which means that the resolution of the average picking error of 0.1s in this area can reach~840m. Considering that the spacing of the seismic array used is 1km, and the spatial span of the late Paleozoic carbonate caprocks is greater than 1km (Chen Guohua et al., 2016), the trained CNN obtained in this study for automatic picking of active source P-wave first arrivals can be used for spatial distribution imaging of carbonate sedimentary cover in the Zhuxi mining area.

Fig. 10 The histogram of CNN picking error (arrivel time difference between automatic picking and manual picking) distribution

In addition, we calculate the root-mean-square errors of CNN and STA/LTA method in the case of the 400 real seismic records added with different levels of noise (Fig. 11). Comparison shows that the CNN method we constructed has better anti-noise ability than the STA/LTA method. With the increase of noise, STA/LTA picking error increases rapidly, while CNN picking error grows slowly. When Gaussian noise with standard deviation of 0.2 is added to the real seismic records, CNN can still achieve good picking precision, while the STA/LTA picking error is large. Fig. 12 shows the picking results of this model for seismic records recorded by different stations in an earthquake in the test set (partially enlarged view). When Gaussian noise is not added and added with standard deviations of 0.1, 0.2 and 0.3 respectively to the real seismic channel set, the root-mean-square errors are respectively 0.02s, 0.03s, 0.04s and 0.07s. It can be seen that the CNN can also maintain high accuracy and strong anti-noise ability in picking real seismic records.

Fig. 11 The change curves of picking errors for seismic records in test set with added noise

Fig. 12 Real seismic records and auto-picking points (a), (b), (c) and (d) are seismic records added with no noise and with Guassian noise with standard deviations of 0.1, 0.2 and 0.3; the red crosses is outo-picking points by CNN and the blue crosses is manual picking points
3 CONCLUSIONS

In this study, a training sample set is constructed by manually picking up the P-wave first arrivals and interception of seismic time window and noise time window, and a convolutional neural network which can automatically recognize two kinds of time windows is established. The network can output continuously the probability of each time window as an earthquake by scanning waveform records, and the middle of the time window with the maximum probability is taken as the first-arrival time of P-wave. By comparing this method with the STA/LTA method, the following conclusions can be drawn:

(1) The convolutional neural network has the ability to distinguish active source seismic signals from noise. For the data used in this study, after 3000 steps of training, the accuracy of the converged CNN can reach more than 99% in identifying earthquakes and noise. Compared with natural microquake recognition, CNN can achieve higher accuracy in recognizing active source seismic signals.

(2) The convolutional neural network can be used to pick up the first-arrival time of the P-wave. Compared with the STA/LTA method, the CNN is more tolerant of noise. Through a series of comparative experiments, it is found that when noise increases, the CNN has a higher precision in picking than the STA/LTA method.

The study results show that the CNN method is effective in automatically picking up P-wave first arrivals of active source signals, and can be applied to follow-up study of seismic exploration based on active sources and imaging of subsurface fine structure. Compared with traditional methods, CNN completely avoids human intervention in the process of picking first arrivals, providing an effective means for rapid processing of a large number of data. As a deep learning method, CNN's performance in recognizing or picking up seismic signals will inevitably improve with the increase of training samples, which is also its advantage over other methods for seismology in the age of big data. Moreover, in this study, active source seismic data recorded by a wide range of passive-source(including natural earthquakes and noise sources) array are used for automatic pickup of P-wave first arrivals, which may be used for joint inversion and imaging of active and passive sources.

ACKNOWLEDGEMENT

Seismic data are from the "Underground Bright Lamp" project of the mineral prospecting experiment in Jiangxi, led by Academician Chen Yong, vibroseis is provided and operated to stimulate seismic signals by the BGP INC., China National Petroleum Corporation, the seismic array for continuous observation is set up by the Geophysical Exploration Center, China Earthquake Administration, measurement of rock wave velocity is done by the Institute of Geophysics, China Earthquake Administration, and rock core samples used are provided by the 912 Brigade of Jiangxi Bureau of Geology and Mineral Resources, here we extend our deepest thanks to them. Thanks to Wang Weitao and two anonymous reviewers for their suggestions for revision.

REFERENCES
Allen R.V. Automatic earthquake recognition and timing from single trace[J]. Bulletin of the Seismological Society of America, 1978, 68(5): 1521-1532.
Bao Xuewei, Eaton D.W. Fault activation by hydraulic fracturing in western Canada[J]. Science, 2016, 354(6318): 1406-1409. DOI:10.1126/science.aag2583
Baer M., Kradolfer U. An automatic phase picker for local and teleseismic events[J]. Bulletin of the Seismological Society of America, 1987, 77(4): 1437-1445.
Bengio Y. Learning deep architectures for AI[J]. Foundations and Trends© in Machine Learning, 2009, 2(1): 1-127. DOI:10.1561/2200000006
Brown J.R., Beroza G.C., Shelly D.R. An autocorrelation method to detect low frequency earthquakes within tremor[J]. Geophysical Research Letters, 2008, 35(16). DOI:10.1029/2008GL034560
Chen Guohua, Shu Liangshu, Shu Limin, Zhang Cheng, Ouyang Yongpeng. Geological characteristics and mineralization setting of the Zhuxi tungsten (copper) polymetallic deposit in the Eastern Jiangnan Orogen[J]. Science China Earth Sciences, 2016, 59(4): 803-823. DOI:10.1007/s11430-015-5200-9
DeVries P.M.R., Viégas F., Wattenberg M., Meade B.J. Deep learning of aftershock patterns following large earthquakes[J]. Nature, 2018, 560(7720): 632-634. DOI:10.1038/s41586-018-0438-y
Gibbons S.J., Ringdal F. The detection of low magnitude seismic events using array-based waveform correlation[J]. Geophysical Journal International, 2006, 165(1): 149-166. DOI:10.1111/j.1365-246X.2006.02865.x
Gu Jinxiang, Wang Zhenhua, Jason K., Ma Lianyang, Shahroudy A., Shuai Bing, Liu Ting, Wang Xingxing, Wang Li, Wang Gang, Cai Jianfei, Chen Tsuhan. Recent advances in convolutional neural networks[J]. arXiv preprint arXiv: 1512.07108, 2015.
He Xianlong, She Tianli, Gao Feng. A new method for picking up arrival times of seismic P and S waves automatically[J]. Chinese Journal of Geophysics, 2016, 59(7): 2519-2527 (in Chinese with English abstract). DOI:10.6038/cjg20160717
Hou Jinxin, Wang Baoshan. Temporal evolution of seismicity before and after the 2014 Ludian MS6.5 earthquake[J]. Chinese Journal of Geophysics, 2017, 60(4): 1446-1456 (in Chinese with English abstract).
House L. Locating microearthquakes induced by hydraulic fracturing in crystalline rock[J]. Geophysical Research Letters, 1987, 14(9): 919-921. DOI:10.1029/GL014i009p00919
Jiang Ce, Wu Jianping, Fang Lihua. Earthquake detection and automatic phase picking[J]. Acta Seismologica Sinica, 2018, 40(1): 45-57 (in Chinese with English abstract). DOI:10.11939/jass.20170093
Kato A., Obara K., Lgarashi T., Tsuruoka H., Nakagawa S., Hirata N. Propagation of slow slip leading up to the 2011 Mw 9.0 Tohoku-Oki earthquake[J]. Science, 2012, 335(6069): 705-708. DOI:10.1126/science.1215141
Kim Y. Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv: 1408.5882, 2014.
Kingma D.P., Ba J. Adam: a method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014.
LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. DOI:10.1109/5.726791
Li Lu, Wang Baoshan, Hou Jinxin. Applications of matched filter technique in seismic data processing[J]. Earthquake Research in China, 2017, 33(1): 14-22 (in Chinese with English abstract).
Liu Han, Zhang Jianzhong. STA/LTA algorithm analysis and improvement of Microseismic signal automatic detection[J]. Progress in Geophysics, 2014, 29(4): 1708-1714 (in Chinese). DOI:10.6038/pg20140429(inChinese)
Liu Xiqiang, Zhou Huilan, Shen Ping, Ma Yanlu, Li Hong. Identification method of seismic phase in three-component seismograms on the basis of wavelet transform[J]. Acta Seismologica Sinica, 2000, 22(2): 125-131 (in Chinese with English abstract). DOI:10.3321/j.issn:0253-3782.2000.02.002
Liu Xiqiang, Zhou Yanwen, Qu Junhao, Shi Yuyan, Li Bo. Real-time detection of regional events and automatic P-phase identification from the vertical component of a single station record[J]. Acta Seismologica Sinica, 2009, 31(3): 260-271 (in Chinese with English abstract). DOI:10.3321/j.issn:0253-3782.2009.03.003
Maeda N. A method for reading and checking phase time in auto-processing system of seismic wave data[J]. Zisin (Journal of the Seismological Society of Japan. 2nd ser.), 1985, 38(3): 365-379. DOI:10.4294/zisin1948.38.3_365(inJapanese)
Peng Zhigang, Zhao Peng. Migration of early aftershocks following the 2004 Parkfield earthquake[J]. Nature Geoscience, 2009, 2(12): 877-881. DOI:10.1038/ngeo697
Perol T., Gharbi M., Denolle M. Convolutional neural network for earthquake detection and location[J]. Science Advances, 2018, 4(2): e1700578. DOI:10.1126/sciadv.1700578
Rawat W., Wang Zenghui. Deep convolutional neural networks for image classification:a comprehensive review[J]. Neural Computation, 2017, 29(9): 2352-2449. DOI:10.1162/neco_a_00990
Shelly D.R., Beroza G.C., Ide S. Non-volcanic tremor and low-frequency earthquake swarms[J]. Nature, 2007, 446(7133): 305-307. DOI:10.1038/nature05666
Shensa M.J. The deflection detector-its theory and evaluation on short-period seismic data[J]. Texas Instruments, Alexandria, Virginia, 1977, TR-77-03.
Sleeman R., van Eck T. Robust automatic P-phase picking:an on-line implementation in the analysis of broadband seismogram recordings[J]. Physics of The Earth and Planetary Interiors, 1999, 13(1/4): 265-275. DOI:10.1016/S0031-9201(99)00007-2
Stevenson P.R. Microearthquakes at flathead lake, Montana:a study using automatic earthquake processing[J]. Bulletin of the Seismological Society of America, 1976, 66(1): 61-80.
Stockwell R.G., Mansinha L., Lowe R.P. Localization of the complex spectrum:the S transform[J]. IEEE Transactions on Signal Processing, 1996, 44(4): 998-1001. DOI:10.1109/78.492555
Tan Yuyang, Yu Jing, Feng Gang, He Chuan. Arrival picking of microseismic events using the SLPEA algorithm[J]. Chinese Journal of Geophysics, 2016, 59(1): 185-196 (in Chinese with English abstract). DOI:10.6038/cjg20160116
Wang Haijun, Jin Ping, Liu Guizhong, Wang Xiaoming. Accurate estimation for arrival time of seismic wave[J]. Northwestern Seismological Journal, 2003, 25(4): 298-303 (in Chinese with English abstract). DOI:10.3969/j.issn.1000-0844.2003.04.003
Wang Peng, Chang Xu, Wang Yibo, Wang Luchen, Zhai Hongyu. Automatic event detection and event recovery in low SNR microseismic signals based on time-frequency sparseness[J]. Chinese Journal of Geophysics, 2014, 57(8): 2656-2665 (in Chinese with English abstract). DOI:10.6038/cjg20140824
Wang Weitao, Wang Baoshan, Jiang Shengmiao, Hu Jiupeng, Zhang Yuansheng. A perspective review of seismological investigation on the crust at regional scale using the active Airgun source[J]. Journal of Seismological Research, 2017, 40(4): 514-524 (in Chinese with English abstract). DOI:10.3969/j.issn.1000-0666.2017.04.002
Yu Ziye, Chu Risheng, Sheng Minhan. Pick onset time of P and S phase by deep neural network[J]. Chinese Journal of Geophysics, 2018, 61(12): 4873-4886 (in Chinese with English abstract). DOI:10.6038/cjg2018L0725
Zhao Dapeng, Liu Xiqiang, Li Hong, Zhou Yanwen. Detection of regional seismic events by kurtosis method and automatic identification of direct P-wave first motion by kurtosis-AIC method[J]. Journal of Seismological Research, 2012, 35(2): 220-226 (in Chinese with English abstract). DOI:10.3969/j.issn.1000-0666.2012.02.010
Zhang Miao, Wen Lianxing. An effective method for small event detection:match and locate (M&L)[J]. Geophysical Journal International, 2015, 200(3): 1523-1537. DOI:10.1093/gji/ggu466
Zhao Ming, Chen Shi, Zhang Bei, Yuen D.A. Automatic identification of aftershocks by convolutional neural networks and its application in waveform detection of China seismic network[J]. Recent Developments in World Seismology, 2018(8): 42-43, doi: 10.3969/j.issn.0253-4975.2018.08.035(in Chinese with English abstract).
Zhu Weiqiang, Beroza G.C. PhaseNet:a deep-neural-network-based seismic arrival-time picking method[J]. Geophysical Journal International, 2019, 216(1): 261-273. DOI:10.1093/gji/ggy423