2. Institute of Geophysics, China Earthquake Administration, Beijing 100081, China;
3. School of Earth and Space Sciences, University of Science and Technology of China, Hefei 230026, China
In seismology, active seismic sources (blasting, vibroseis vehicles and airguns, etc.) are widely used for localscale nearsurface fine structure exploration and timevarying analysis of wave velocity (e.g., Wang Weitao et al., 2017). In terms of energy release, active sources can be classified as microquakes. In recent years, microseismic research has become a hot topic in seismology, which is of great significance in the determination of subsurface fine structures, aftershock detection (Peng Zhigang et al., 2009), earthquake early warnings (Kato A. et al., 2012) and microseismic monitoring of hydraulic fracturing (House L., 1987; Wang Peng et al., 2014; Bao Xuewei et al., 2016). The low energy and high frequency characteristics of microquakes make it difficult to effectively identify from the noise, and the exponential growth of seismic data requires improvement of the efficiency of seismic phase pickup, therefore, efficient and accurate seismic signal identification and seismic phase pickup are important foundations of microseismic research.
Since the 1970s, seismologists have proposed and developed a series of methods for automatic seismic phase identification and pickup. In essence, they extract the features of the original signals, and identify earthquakes and pick up arrival times based on the difference between the background noise and seismic signals. Commonly used methods are as follows: firstly, the ratio between short and long time windows (STA/LTA method, Stevenson P. R., 1976; Allen R. V., 1978; Baer M. et al., 1987; Liu Han et al., 2014). Based on amplitude characteristics, this method is widely used because of its simplicity and efficiency in calculation, by which, the ratio of the average value of wave packets in short and long time windows (or other characteristic functions) is calculated and the threshold value is set for pickup. However, parameters such as the lengths of short and long time windows and the threshold value need to be set manually. Moreover, this method is not competent when the signaltonoise ratio is low; Secondly, the Akaike information criterion method (AIC method, Maeda N., 1985; Sleeman R. et al., 1999; Wang Haijun et al., 2003; Liu Xiqiang et al., 2009; Zhao Dapeng et al., 2012). This method takes the minimum value of the AIC function of seismic signals as the cutoff point between background noise and seismic signals to pick seismic phase, and the disadvantage is that the AIC function has the minimum value with or without earthquakes; Thirdly, methods based on features of frequency domain. Such methods use fast Fourier transform (Shensa M. J., 1977), S transform(Stockwell R.G. et al., 1996) or wavelet transform (Liu Xiqiang et al., 2000) to calculate frequency spectrum or phase information to identify earthquakes and pick up seismic phase, and the disadvantage is low time resolution; Fourthly, the correlationbased approach (Gibbons S. J. et al., 2006; Shelly D. R. et al., 2007; Brown J. R. et al., 2008; Zhang Miao et al., 2015; Hou Jinxin et al., 2017; Li Lu et al., 2017). This method mainly uses the similarity of waveforms of the earthquake and can accurately identify microquakes in seismic records with low signaltonoise ratio, but the template earthquake needs to be set in advance and it's easy to miss earthquakes with dissimilar waveforms. In addition, scholars also put forward ways which integrate the above methods (Tan Yuyang et al., 2016; He Xianlong et al., 2016; Jiang Ce et al., 2018). In theory, they can make up for the shortcomings of each method, but due to the necessity to comprehensively consider the working conditions and accuracy judgment of all methods involved, the algorithm is complex and requires human participation, which makes it difficult to fully automate execution.
The convolutional neural network (CNN) is one of the frameworks most widely used in deep learning, which has made great breakthroughs in image recognition, speech recognition, natural language processing, object detection and semantic segmentation(e.g., Kim Y., 2014; Rawat W. et al., 2017) by feature extraction of the input data through the convolution layer (LeCun Y. et al., 1998), and seismologists have also begun to apply this method of extracting features from data to seismic signal identification and automatic pickup of seismic phases. Perol T. et al., (2018) proposed ConvNetQuake by using a large number of threecomponent seismic data, and the convolutional neural network obtained by training can recognize and locate microquakes effectively. The PhaseNet proposed by Zhu Weiqiang et al.(2019) is a seismic phase pickup method based on CNN regression. AUnet is trained by using the input seismic waveform records and its preset output waveform, and the trained Unet can directly pick up travel times of P and S seismic phases. Zhao Ming et al. (2018) constructed a large number of highquality training and testing sample sets by using the arrivals of P and S waves of more than 10, 000 aftershocks of the Wenchuan earthquake manually selected by experts, and trained the convolutional neural network to be applied in aftershock detection and seismic detection of China Earthquake Networks Center. Besides, other deep learning networks are also gradually introduced into the field of seismology. For example, DeVries P. M. R. et al., (2018) collected a large number of mainshockaftershock pairs and used this data to train a deep artificial neural network. The trained network can predict the probability of aftershock location to some extent. Yu Ziye et al. (2018) applied the inception network structure used in face recognition to the pickup of first arrivals of Pwave and Swave of natural earthquakes. The convolutional neural network and deep learning technology have been proved to be helpful to seismological research, but there is still a lack of specific research on the recognition and pickup of active source signals.
In the experiment in Jingdezhen, Jiangxi Province at the end of 2017, the activesource seismic signals excited by vibroseis vehicles were received by the twodimensional array commonly used for passive seismic study (covering 100km^{2}, the blue triangles in Fig. 1(a)). In this study, the continuous waveforms recorded by the array are used to train CNN to identify activesource seismic signals and pick up Pwave first arrivals. Compared with the CNN used for natural earthquake identification and seismic pickup in the past, since the time and spatial location of active seismic source are known, we can more precisely evaluate the accuracy of microseismic identification and seismic phase pickup of CNN at a low SNR. In this study, the arrival times of a large number of P waves excited by active source are picked up manually, and a large number of samples are acquired by intercepting time windows of earthquakes and noise. Then CNN is trained to accurately identify different samples, and the trained CNN is used to scan continuous waveform and applied to the pickup of Pwave first arrivals. The results show that the trained CNN is with a high degree of accuracy and precision in the identification of active source seismic signals and automatic pick up of Pwave first arrivals, and can deal with data with low SNRs.
Integrated threedimensional geophysical exploration was carried out in the Zhuxi mining area of Jingdezhen, Jiangxi Province at the end of 2017. 192 temporary seismic stations (blue triangles in Fig. 1(a)) were set up, including 108 stations with a spacing of 1km, 69 stations with a spacing of 2km and 15 stations with a spacing of 5km. The data in this study is from the continuous records of vertical components (sampling interval: 0.005s) recorded by shortperiod seismograph (model: EPS portable seismographs), and the activesource signals were excited by MINI28 vibroseis developed by the Bureau of Geophysical Prospecting INC., China National Petroleum Corporation. The vibroseis worked 758 times during the period (red pentagrams in Fig. 1(a)), and a total of 7, 242 seismic records with Pwave first arrivals were picked up manually. Their ray paths are shown by the gray lines in Fig. 1(a). 85.3% of the epicenter distances of these records are within 4km.
In addition to routine deaveraging, linear detrending and the removal of instrument response, the data preprocessing also includes: 1) Crosscorrelation. Instead of pulse signals, the vibroseis produces continuous vibration signals, and we need to let CNN act on signals after crosscorrelation, therefore, crosscorrelation calculation is done between the vibroseis scanned signals and the received signals to remove the influence of source time function, so as to improve the signaltonoise ratio of firstarrival waves; 2) Manual picking and normalization. 7242 pieces of Pwave firstarrival times are picked up manually and marked as T1, and waveform records of 20s (1s before the time of the quake and 19s after the quake) are cutted and the amplitudes are normalized respectively. This is because the recorded amplitudes of the original waveforms vary greatly with the distances from the epicenter, so it will be difficult for CNN to extract key features of activesource seismic signals if they are directly used as training samples; 3) Acquisition of earthquake samples. According to formula (1), the SNR is calculated and the SNR threshold (SNR > 5db) is set to screen out 4, 389 seismic records, and different levels of Gaussian noise (mean value is 0 and standard deviation ranges from 0.02 to 0.1) is added to these records to obtain 25, 290 seismic records. Data cutted 1 second before and after T1 of these seismic records are taken as earthquake samples; 4) Acquisition of noise samples. A total of 710616 noise samples are obtained by cutting time windows 1012s and 1214s after the T1 of the above seismic records and 1012s and 1214s after the theoretical arrival times of seismic records of Pwave first motions (calculated according to AK135 velocity model) which are unable to be picked up manually. Then two types of samples (see samples in Fig. 2) are labeled as the corresponding expected CNN output, earthquake samples are labeled as [0, 1], and noise samples are labeled as [1, 0]. Finally, all earthquake and noise samples are divided into training and testing sets. The training set incorporates 22, 548 earthquake samples and 617, 926 noise samples, which are used to adjust parameters and hyperparameters of the neural network, and the test set incorporates 2, 742 earthquake samples and 92, 690 noise samples, which are used to test the accuracy of neural network identification.
$ {\rm{SNR}} = 10 \cdot \log \left( {\frac{{\sum\nolimits_{{T_1}}^{{T_{1 + 1s}}} {{X^2}} }}{{\sum\nolimits_{{T_{1 + 10s}}}^{{T_{1 + 11s}}} {{X^2}} }}} \right) $  (1) 
CNN is a kind of deep neural network with convolutional structure, which can reduce the number of parameters of the network and can alleviate the overfitting problem of the model. Classic CNN consists of the input layer, convolutional layer, down sampling layer (also known as the pooling layer), fully connected layer and output layer. The structure of the CNN can be modified according to specific situations. The CNN is in essence a kind of mapping from the input to the output, and can automatically extract the characteristics of the input data and learn a large number of mapping relationships between the input and the output, without the need of a precise mathematical expression between any input and output data. As long as it is trained with training samples, CNN can learn the mapping relationship between the input and the output. CNN can perform supervised training directly (Bengio Y., 2009) with the backpropagation algorithm (BP algorithm). The back propagation is relative to the forward propagation. In the training of the neural network, we calculate the predicted value of the current model using forward propagation and calculate the difference between the predicted value and the theoretical value to get the loss function. According to the final cost function, the partial derivative of each parameter is obtained by using the BP algorithm, and then the parameters are adjusted and updated according to the gradient descent algorithm to realize back propagation, so as to make our model more accurate in prediction.
In this study, CNN is trained with earthquake and noise samples in training sets, so that it can extract the characteristics of seismic data and thus has the ability to distinguish earthquake time windows from noise time windows. 10% of the training data is randomly selected as the validation set to adjust hyperparameters of the network, and the rest is used as the training set to adjust the network parameters. Through repeated experiments and comparing the classification accuracy of different models, we determine the basic structure of CNN (Fig. 3), which mainly incorporates an input layer Z^{0}, 7 convolutional layers (Z^{1}Z^{7}), 1 flattened layer (Z^{7}) and 1 fully connected output layer. Since the input data is the seismic records with a sampling rate of 200Hz and a time window length of 2s, the input layer is a vector with a length of 400 sampling points. Each convolutional layer contains 20 convolution kernels, each of which represents a channel to extract one of the characteristics of the data of the previous layer. The length of the convolution kernel is 8, the moving step is 2, so with each convolution, the characteristic length is halved (When the characteristic lengths of the upper layer are odd numbers, the system will automatically add zero at the edge of the characteristics to make them divisible by 2). The values of neural nodes of each layer are obtained from the previous layer through convolution, summation, adding bias terms and adding activation function, and the calculation formula is as below.
$ Z_{c,t}^i = \sigma \left( {\sum\limits_{c' = 1}^{{c_{i  1}}} {\sum\limits_{t' = 1}^8 {Z_{c',st + t'  2}^{i  1}} } \cdot W_{ct'}^i + b_c^i} \right),i \in \{ 1, \cdots ,7\} $  (2) 
where, σ(·)=max (0, ·), namely, RELU activation function. The input and output channels are c′ and c respectively, t′ and t represent data of the t′th and tth sampling points of the features of the previous layer and current layer, i is the number of layers, C_{i} denotes the number of channels of the ith layer (the number of channels at the input layer is 1, and the number of channels at other convolutional layers is 20), s is the moving step length of convolution kernel, Z is the node value of the network, W denotes the weight, and b represents the bias term. Through 7 convolutional layers, the input data (1, 400) changes to (20, 4), which means that the network extracts 20 sets of features of the input data, and each set of features has a length of 4.
The 20 sets of features are flattened into a characteristic vector (1, 80), and after passing through a fully connected layer, a vector [Z_{0}, Z_{1}] with a length of 2 is obtained (formula(3), Bengio Y., 2009). Two elements of the vector, Z_{0} and Z_{1}, are respectively noise and earthquake score values. Finally, Softmax function is used to normalize the score value into a probability vector [p_{0}, p_{1}] (formula (4), Bengio Y., 2009). c in the formula takes the value of 0 or 1, indicating respectively noise or earthquake, p_{C} represents the probability of two kinds, and p_{0}+p_{1}=1. When p_{0} is greater than 0.5, CNN judges the input sample as noise, and when p_{1} is greater than 0.5, CNN judges the input sample as earthquake.
$ {Z_c} = \sum\nolimits_{c' = 1}^{80} {Z_{c'}^{\bar 7}} \cdot w_{cc'}^8 + b_c^8,c \in \{ 0,1\} $  (3) 
$ {p_C} = \frac{{\exp \left( {{Z_C}} \right)}}{{\sum\nolimits_{k = 0}^1 {\exp } \left( {{Z_k}} \right)}} $  (4) 
We use the crossentropy loss function (formula (5), Gu Jinxiang et al., 2015) to optimize the entire network, where N is the number of samples. The crossentropy loss function can show the difference between the probability distribution predicted by the model p^{k} and ideal probability distribution q^{k}. For each sample, the ideal probability distribution can be expressed as formula (6) (Gu Jinxiang et al., 2015), and C in the formula takes the value of 0 or 1, indicating respectively noise or earthquake. In order to avoid overfitting caused by too large network parameters, we add secondorder norm penalty term of the weight matrix to the loss function to regularize the network, and the regularization factor λ=0.001.
$ L = \frac{1}{N}\sum\nolimits_{k = 1}^N {\sum\nolimits_{c = 0}^1 {q_c^k} } \log \left( {p_c^k} \right) + \lambda \sum\nolimits_{i = 1}^8 {{W^i}_2^2} $  (5) 
$ q_c^k = \left\{ {\begin{array}{*{20}{c}} {1,{\mathop{\rm class}\nolimits} (k) = c}&{}\\ {0,\;{\mathop{\rm class}\nolimits} (k) \ne c}&{} \end{array}} \right. $  (6) 
Because of the large number of network parameters and training data, in order to improve the calculation efficiency, we employ the Adam (Adaptive Moment Estimation) algorithm (Kingma D. P. et al., 2014) to minimize the loss function L. The Adam algorithm is an improved stochastic gradient descent algorithm, which can update neural network parameters iteratively based on training data. In formula (5), N is 512, which means 512 samples are randomly filled into the network for each step of training. The backpropagation algorithm is used to calculate the gradient of L function with respect to network parameters, and then update the network parameters are adjusted according to the gradient direction. The training process is repeated to make the L function increasingly smaller. The learning rate during training is set at 0.0001. If the learning rate is too high, the model is difficult to converge, and if the learning rate is too small, the convergence speed is too slow. As the training continues, the loss function converges rapidly from 0.87 and then gradually slows down, and after 20, 000 steps of training, L already converges to 0.038. At this point, it can be considered that the gap between the ideal output and the actual output is small enough, indicating that the network already has good fitting ability (Fig. 4). The hardware used in the training is GTX 1080 Ti GPU produced by NVIDIA Corporation, and the training takes about 50min for 20, 000 steps. When there are too many iterations, the accuracy of the neural network on the training set will still increase, but its generalization ability will decrease, and overfitting occurs, meaning that its performance on the test will decline. It is necessary to constantly monitor the accuracy of the neural network for the validation set or test set. When the performance of the model for the validation set or test set does not improve, the training is terminated in advance to avoid overfitting. After repeated tests and analysis, it is found that the model with 3, 000 steps of training has the best performance for the test set, therefore, when applying the training results, we select the convolutional neural network model obtained after 3, 000 steps of training.
After the convolutional neural network is trained to reach the convergence state, we use the test set to detect the recognition accuracy on activesource seismic signals or noise signals. The recognition results output by classifier on the test set can be divided into four cases: TP—the number of earthquake samples recognized as earthquakes, FN—the number of earthquake samples recognized as noise, FP—the number of noise samples recognized as earthquakes, TN—the number of noise samples recognized as noise, in which P and N represent the number of positive and negative samples respectively. Generally, there are three indicators for the evaluation of the recognition ability of the neural network model:
$ \begin{array}{*{20}{c}} {{\rm{ Accuracy }} = \frac{{TP + TN}}{{P + N}}}\\ {{\rm{ Precision }} = \frac{{TP}}{{TP + FP}}}\\ {{\rm{ Recall }} = \frac{{TP}}{{TP + FN}}} \end{array} $  (7) 
where the accuracy rate indicates the ratio between the number of correctly recognized samples in the test set and the total number of samples in the test set, whether positive or negative; The precision rate is the ratio of the number of positive samples correctly predicted and the number of positive samples predicted by the model; The recall rate indicates the ratio between the number of positive samples correctly predicted and the total number of positive samples in the test set. The higher the values of these three quantities are, the stronger the recognition ability of CNN is.
Test samples are inputted into the trained CNN, and a probability vector with a length of 2 is output, [p_{0}, p_{1}], (formula (4)), where, p_{0} and p_{1} denote the probability that the sample should be tested for noise and earthquake respectively. When p_{0} is greater than 0.5, CNN judges the input sample as noise, and when p_{1} is greater than 0.5, CNN judges the input sample as earthquake. We have calculated all the indicators of CNN in the test set, and the test results show that the accuracy, precision and recall rates of CNN are all above 98% after 1000 steps of training (Table 1), indicating that the CNN has been able to accurately distinguish the two kinds of signals by extracting the characteristics of active source earthquakes and noise signals. However, the accuracy rate of the CNN in identifying natural microquakes is only 55%—73%, the recall rate 62%—77% (Zhao Ming et al., 2018), which shows that the CNN can achieve higher precision in identifying active source earthquakes. This may be related to the more complex characteristics of the natural seismic records, because natural seismic records are affected by focal mechanism, epicentral distance, magnitude and other factors, which are uncontrollable and variable, so it is difficult to ensure the completeness of the training samples when constructing it. In comparison with natural earthquakes, active source earthquakes have the advantages of high controllability in excitation times, location and source characteristics, high repeatability and intensive observation. All things considered, active source earthquakes can realize highprecision wave velocity structure exploration of subsurface media, which effectively makes up for the low resolution of possive source imaging at shallow depths. CNN can achieve good performance in the automatic recognition of active source earthquakes and has a wide range of potential applications.
After the CNN has been trained, in order to apply it to the picking of first arrivals, we scan continuous seismic records using the trained CNN. Starting from the initial moment, CNN scans point by point with a time window of 2s. For example, the first time window is to detect seismic records from 0s to 2s and the second time window is to detect seismic records from 0.005s to 2.005s. For each time window, the model will output the probability that the time window is a seismic sample or the middle of the time window is the first arrival time of the earthquake (Fig. 5). Because when constructing earthquake samples, the firstarrival time is at the middle of the sampling time window. In order to test the effectiveness of this strategy, four continuous seismic records from the same seismic station and different seismic sources (not participating in the training) are connected together as seismic records to be detected to test the effectiveness of earthquake picking of the model. As shown in Fig. 6, CNN scans seismic records to be detected (Fig. 6(a)) and gives the detection value of probability of earthquakes at the corresponding moments (Fig. 6(b)). On the whole, CNN can pick up the first arrival time of Pwave well in continuous seismic records.
In order to further test the antinoise ability of this autopicking strategy, we add noise to the seismic waveform records, and scan the seismic records added with noise still by CNN to obtain the probability detection curves, which are compared with STA/LTA curves. The results show that the trained CNN model can still pick up seismic signals accurately, and its picking ability is better than the STA/LTA method. Both methods can pick up seismic signals when the SNR of records is high. But when the noise increases, the value of the STA/LTA function will decrease. At this point, it may not be able to pick up or the threshold value needs to be manually adjusted for picking. However, the CNN method we constructed can still provide a prediction value close to 1 for seismic signals with poor signaltonoise ratio (Fig. 7). Thus, it can be seen that the CNN is more tolerant of noise than the STA/LTA method.
In order to determine the precision of first arrival picking of the CNN, we calculate the rootmeansquare error between CNN, STA/LTA automatic picking of arrival times and manual picking (formula (8)), and t(i) and
$ {\mathop{\rm RMSE}\nolimits} = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^N {(t(} i)  \hat t(i){)^2}} $  (8) 
In order to test the picking precision of the CNN on real seismic records, we use this model to pick up 400 real seismic records (not participating in the training), and calculate the error distribution of the picking of the Pwave first arrivals of the real seismic records (Fig. 10). The results show that the average error is 0.015±0.098s, and 71.75% of the picking error is within 0.1s. For the Zhuxi mining area, the velocity measurement results of rock core samples show that the velocity in deep Carboniferous skarn (v_{P}:6.80km/s) and Neoproterozoic phyllitic siltstone (v_{P}:5.85km/s) is higher, while velocity in shallow bioclastic limestone (v_{P}:3.57km/s), surrounding marble (v_{P}:2.73km/s) and surrounding dolomite (v_{P}:4.61km/s) varies significantly. Assuming that the average velocities of the highvelocity and lowvelocity rock mass are 6.0km/s and 3.5km/s respectively, the travel time difference of 0.1s requires the seismic wave to travel 0.1/(1/3.51/6.0)=0.84km in medium of different velocities, which means that the resolution of the average picking error of 0.1s in this area can reach~840m. Considering that the spacing of the seismic array used is 1km, and the spatial span of the late Paleozoic carbonate caprocks is greater than 1km (Chen Guohua et al., 2016), the trained CNN obtained in this study for automatic picking of active source Pwave first arrivals can be used for spatial distribution imaging of carbonate sedimentary cover in the Zhuxi mining area.
In addition, we calculate the rootmeansquare errors of CNN and STA/LTA method in the case of the 400 real seismic records added with different levels of noise (Fig. 11). Comparison shows that the CNN method we constructed has better antinoise ability than the STA/LTA method. With the increase of noise, STA/LTA picking error increases rapidly, while CNN picking error grows slowly. When Gaussian noise with standard deviation of 0.2 is added to the real seismic records, CNN can still achieve good picking precision, while the STA/LTA picking error is large. Fig. 12 shows the picking results of this model for seismic records recorded by different stations in an earthquake in the test set (partially enlarged view). When Gaussian noise is not added and added with standard deviations of 0.1, 0.2 and 0.3 respectively to the real seismic channel set, the rootmeansquare errors are respectively 0.02s, 0.03s, 0.04s and 0.07s. It can be seen that the CNN can also maintain high accuracy and strong antinoise ability in picking real seismic records.
In this study, a training sample set is constructed by manually picking up the Pwave first arrivals and interception of seismic time window and noise time window, and a convolutional neural network which can automatically recognize two kinds of time windows is established. The network can output continuously the probability of each time window as an earthquake by scanning waveform records, and the middle of the time window with the maximum probability is taken as the firstarrival time of Pwave. By comparing this method with the STA/LTA method, the following conclusions can be drawn:
(1) The convolutional neural network has the ability to distinguish active source seismic signals from noise. For the data used in this study, after 3000 steps of training, the accuracy of the converged CNN can reach more than 99% in identifying earthquakes and noise. Compared with natural microquake recognition, CNN can achieve higher accuracy in recognizing active source seismic signals.
(2) The convolutional neural network can be used to pick up the firstarrival time of the Pwave. Compared with the STA/LTA method, the CNN is more tolerant of noise. Through a series of comparative experiments, it is found that when noise increases, the CNN has a higher precision in picking than the STA/LTA method.
The study results show that the CNN method is effective in automatically picking up Pwave first arrivals of active source signals, and can be applied to followup study of seismic exploration based on active sources and imaging of subsurface fine structure. Compared with traditional methods, CNN completely avoids human intervention in the process of picking first arrivals, providing an effective means for rapid processing of a large number of data. As a deep learning method, CNN's performance in recognizing or picking up seismic signals will inevitably improve with the increase of training samples, which is also its advantage over other methods for seismology in the age of big data. Moreover, in this study, active source seismic data recorded by a wide range of passivesource(including natural earthquakes and noise sources) array are used for automatic pickup of Pwave first arrivals, which may be used for joint inversion and imaging of active and passive sources.
ACKNOWLEDGEMENTSeismic data are from the "Underground Bright Lamp" project of the mineral prospecting experiment in Jiangxi, led by Academician Chen Yong, vibroseis is provided and operated to stimulate seismic signals by the BGP INC., China National Petroleum Corporation, the seismic array for continuous observation is set up by the Geophysical Exploration Center, China Earthquake Administration, measurement of rock wave velocity is done by the Institute of Geophysics, China Earthquake Administration, and rock core samples used are provided by the 912 Brigade of Jiangxi Bureau of Geology and Mineral Resources, here we extend our deepest thanks to them. Thanks to Wang Weitao and two anonymous reviewers for their suggestions for revision.
Allen R.V. Automatic earthquake recognition and timing from single trace[J]. Bulletin of the Seismological Society of America, 1978, 68(5): 15211532. 
Bao Xuewei, Eaton D.W. Fault activation by hydraulic fracturing in western Canada[J]. Science, 2016, 354(6318): 14061409. DOI:10.1126/science.aag2583 
Baer M., Kradolfer U. An automatic phase picker for local and teleseismic events[J]. Bulletin of the Seismological Society of America, 1987, 77(4): 14371445. 
Bengio Y. Learning deep architectures for AI[J]. Foundations and Trends^{©} in Machine Learning, 2009, 2(1): 1127. DOI:10.1561/2200000006 
Brown J.R., Beroza G.C., Shelly D.R. An autocorrelation method to detect low frequency earthquakes within tremor[J]. Geophysical Research Letters, 2008, 35(16). DOI:10.1029/2008GL034560 
Chen Guohua, Shu Liangshu, Shu Limin, Zhang Cheng, Ouyang Yongpeng. Geological characteristics and mineralization setting of the Zhuxi tungsten (copper) polymetallic deposit in the Eastern Jiangnan Orogen[J]. Science China Earth Sciences, 2016, 59(4): 803823. DOI:10.1007/s1143001552009 
DeVries P.M.R., Viégas F., Wattenberg M., Meade B.J. Deep learning of aftershock patterns following large earthquakes[J]. Nature, 2018, 560(7720): 632634. DOI:10.1038/s415860180438y 
Gibbons S.J., Ringdal F. The detection of low magnitude seismic events using arraybased waveform correlation[J]. Geophysical Journal International, 2006, 165(1): 149166. DOI:10.1111/j.1365246X.2006.02865.x 
Gu Jinxiang, Wang Zhenhua, Jason K., Ma Lianyang, Shahroudy A., Shuai Bing, Liu Ting, Wang Xingxing, Wang Li, Wang Gang, Cai Jianfei, Chen Tsuhan. Recent advances in convolutional neural networks[J]. arXiv preprint arXiv: 1512.07108, 2015.

He Xianlong, She Tianli, Gao Feng. A new method for picking up arrival times of seismic P and S waves automatically[J]. Chinese Journal of Geophysics, 2016, 59(7): 25192527 (in Chinese with English abstract). DOI:10.6038/cjg20160717 
Hou Jinxin, Wang Baoshan. Temporal evolution of seismicity before and after the 2014 Ludian M_{S}6.5 earthquake[J]. Chinese Journal of Geophysics, 2017, 60(4): 14461456 (in Chinese with English abstract). 
House L. Locating microearthquakes induced by hydraulic fracturing in crystalline rock[J]. Geophysical Research Letters, 1987, 14(9): 919921. DOI:10.1029/GL014i009p00919 
Jiang Ce, Wu Jianping, Fang Lihua. Earthquake detection and automatic phase picking[J]. Acta Seismologica Sinica, 2018, 40(1): 4557 (in Chinese with English abstract). DOI:10.11939/jass.20170093 
Kato A., Obara K., Lgarashi T., Tsuruoka H., Nakagawa S., Hirata N. Propagation of slow slip leading up to the 2011 M_{w} 9.0 TohokuOki earthquake[J]. Science, 2012, 335(6069): 705708. DOI:10.1126/science.1215141 
Kim Y. Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv: 1408.5882, 2014.

Kingma D.P., Ba J. Adam: a method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014.

LeCun Y., Bottou L., Bengio Y., Haffner P. Gradientbased learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 22782324. DOI:10.1109/5.726791 
Li Lu, Wang Baoshan, Hou Jinxin. Applications of matched filter technique in seismic data processing[J]. Earthquake Research in China, 2017, 33(1): 1422 (in Chinese with English abstract). 
Liu Han, Zhang Jianzhong. STA/LTA algorithm analysis and improvement of Microseismic signal automatic detection[J]. Progress in Geophysics, 2014, 29(4): 17081714 (in Chinese). DOI:10.6038/pg20140429(inChinese) 
Liu Xiqiang, Zhou Huilan, Shen Ping, Ma Yanlu, Li Hong. Identification method of seismic phase in threecomponent seismograms on the basis of wavelet transform[J]. Acta Seismologica Sinica, 2000, 22(2): 125131 (in Chinese with English abstract). DOI:10.3321/j.issn:02533782.2000.02.002 
Liu Xiqiang, Zhou Yanwen, Qu Junhao, Shi Yuyan, Li Bo. Realtime detection of regional events and automatic Pphase identification from the vertical component of a single station record[J]. Acta Seismologica Sinica, 2009, 31(3): 260271 (in Chinese with English abstract). DOI:10.3321/j.issn:02533782.2009.03.003 
Maeda N. A method for reading and checking phase time in autoprocessing system of seismic wave data[J]. Zisin (Journal of the Seismological Society of Japan. 2nd ser.), 1985, 38(3): 365379. DOI:10.4294/zisin1948.38.3_365(inJapanese) 
Peng Zhigang, Zhao Peng. Migration of early aftershocks following the 2004 Parkfield earthquake[J]. Nature Geoscience, 2009, 2(12): 877881. DOI:10.1038/ngeo697 
Perol T., Gharbi M., Denolle M. Convolutional neural network for earthquake detection and location[J]. Science Advances, 2018, 4(2): e1700578. DOI:10.1126/sciadv.1700578 
Rawat W., Wang Zenghui. Deep convolutional neural networks for image classification:a comprehensive review[J]. Neural Computation, 2017, 29(9): 23522449. DOI:10.1162/neco_a_00990 
Shelly D.R., Beroza G.C., Ide S. Nonvolcanic tremor and lowfrequency earthquake swarms[J]. Nature, 2007, 446(7133): 305307. DOI:10.1038/nature05666 
Shensa M.J. The deflection detectorits theory and evaluation on shortperiod seismic data[J]. Texas Instruments, Alexandria, Virginia, 1977, TR7703.

Sleeman R., van Eck T. Robust automatic Pphase picking:an online implementation in the analysis of broadband seismogram recordings[J]. Physics of The Earth and Planetary Interiors, 1999, 13(1/4): 265275. DOI:10.1016/S00319201(99)000072 
Stevenson P.R. Microearthquakes at flathead lake, Montana:a study using automatic earthquake processing[J]. Bulletin of the Seismological Society of America, 1976, 66(1): 6180. 
Stockwell R.G., Mansinha L., Lowe R.P. Localization of the complex spectrum:the S transform[J]. IEEE Transactions on Signal Processing, 1996, 44(4): 9981001. DOI:10.1109/78.492555 
Tan Yuyang, Yu Jing, Feng Gang, He Chuan. Arrival picking of microseismic events using the SLPEA algorithm[J]. Chinese Journal of Geophysics, 2016, 59(1): 185196 (in Chinese with English abstract). DOI:10.6038/cjg20160116 
Wang Haijun, Jin Ping, Liu Guizhong, Wang Xiaoming. Accurate estimation for arrival time of seismic wave[J]. Northwestern Seismological Journal, 2003, 25(4): 298303 (in Chinese with English abstract). DOI:10.3969/j.issn.10000844.2003.04.003 
Wang Peng, Chang Xu, Wang Yibo, Wang Luchen, Zhai Hongyu. Automatic event detection and event recovery in low SNR microseismic signals based on timefrequency sparseness[J]. Chinese Journal of Geophysics, 2014, 57(8): 26562665 (in Chinese with English abstract). DOI:10.6038/cjg20140824 
Wang Weitao, Wang Baoshan, Jiang Shengmiao, Hu Jiupeng, Zhang Yuansheng. A perspective review of seismological investigation on the crust at regional scale using the active Airgun source[J]. Journal of Seismological Research, 2017, 40(4): 514524 (in Chinese with English abstract). DOI:10.3969/j.issn.10000666.2017.04.002 
Yu Ziye, Chu Risheng, Sheng Minhan. Pick onset time of P and S phase by deep neural network[J]. Chinese Journal of Geophysics, 2018, 61(12): 48734886 (in Chinese with English abstract). DOI:10.6038/cjg2018L0725 
Zhao Dapeng, Liu Xiqiang, Li Hong, Zhou Yanwen. Detection of regional seismic events by kurtosis method and automatic identification of direct Pwave first motion by kurtosisAIC method[J]. Journal of Seismological Research, 2012, 35(2): 220226 (in Chinese with English abstract). DOI:10.3969/j.issn.10000666.2012.02.010 
Zhang Miao, Wen Lianxing. An effective method for small event detection:match and locate (M&L)[J]. Geophysical Journal International, 2015, 200(3): 15231537. DOI:10.1093/gji/ggu466 
Zhao Ming, Chen Shi, Zhang Bei, Yuen D.A. Automatic identification of aftershocks by convolutional neural networks and its application in waveform detection of China seismic network[J]. Recent Developments in World Seismology, 2018(8): 4243, doi: 10.3969/j.issn.02534975.2018.08.035(in Chinese with English abstract).

Zhu Weiqiang, Beroza G.C. PhaseNet:a deepneuralnetworkbased seismic arrivaltime picking method[J]. Geophysical Journal International, 2019, 216(1): 261273. DOI:10.1093/gji/ggy423 