Earthquake Reaearch in China  2019, Vol. 33 Issue (1): 37-51     DOI: 10.19743/j.cnki.0891-4176.201901003
Simultaneous Denoising and Interpolation of Seismic Data via the Deep Learning Method
GAO Han, ZHANG Jie     
University of Science and Technology of China, Hefei 230026, China
Abstract: Utilizing data from controlled seismic sources to image the subsurface structures and invert the physical properties of the subsurface media is a major effort in exploration geophysics. Dense seismic records with high signal-to-noise ratio (SNR) and high fidelity helps in producing high quality imaging results. Therefore, seismic data denoising and missing traces reconstruction are significant for seismic data processing. Traditional denoising and interpolation methods rarely occasioned rely on noise level estimations, thus requiring heavy manual work to deal with records and the selection of optimal parameters. We propose a simultaneous denoising and interpolation method based on deep learning. For noisy records with missing traces, we adopt an iterative alternating optimization strategy and separate the objective function of the data restoring problem into two sub-problems. The seismic records can be reconstructed by solving a least-square problem and applying a set of pre-trained denoising models alternatively and iteratively.We demonstrate this method with synthetic and field data.
Key words: Deep learning     Convolutional neural network     Denoising     Data interpolation     Iterative alternating    

INTRODUCTION

Denoising is one of the fundamental efforts in seismic data processing. During seismic wave propagation, wavefields with various components contain abundant information of subsurface media, which provides data for geophysical studies. Seismic signals are often contaminated by the random noise which mainly comes from environments. Some seismic components are damageable because their representation portions are delicate. For example, part of reflected waves is usually obscured by surface waves and random noise. Thus suppressing noise and extracting signal structures (e.g. smooth contours, edges and directional details in signals) are important and challenging. Besides, dense records are required by the subsequent data processing and imaging steps. Unfortunately, seismic data observed in the field often includes bad traces or missing traces due to economical and physical constraints. Thus seismic data interpolation is also essential for data processing.

In the past a few decades, many methods from different fields were proposed to remove random noise from seismic data. In geophysics, traditional denoising methods are often applied in the transform domain. Canales L. L., (1984) presented a denoising approach in a 2-D frequency-spatial (f-x) domain to suppress random noise in seismic records. Chanerley A. A. et al., (2002) applied the stationary wavelet transform to the denoising procedure. Candès E. J. et al. (2000) introduced the curvelet transform, which provides an optimal sparse representation of events with second-order continuous differentiable singularities. The curvelet transform represents edges of curves more precisely. For discontinuous events, curvelets could achieve higher precision than standard wavelets (Hennenfent G. et al., 2006). In recent years, several image denoising techniques based on machine learning were applied to process seismic data, such as nonlocal means algorithm (Bonar D. et al., 2012), rank-reduction algorithm (Chen Yangkang et al., 2016), and dictionary learning (Tang Gang et al., 2012; Beckouche S. et al., 2014). These methods all require estimating the noise level. The denoising parameters are then selected according to the noise level. If the estimated noise level is too high, valid information could be damaged; if it is too low, the noise can't be removed effectively. Unfortunately, field data usually includes various noise distribution. Seismic records of near offset usually achieve high SNR, while far offset data suffers from low SNR. The SNR of adjacent traces may be different as well. In this case, noise-level-based denoising methods could not achieve satisfying performance. Besides, the noise recorded in real cases usually includes similar frequency bands to the signals, which could not be removed effectively by most traditional methods. All these issues bring great challenges to denoising.

Algorithms based on transform domains are also widely used in seismic data interpolation. Spitz S., (1991) presented the f-x domain interpolation method for linear events. Curve events can be processed by windows, but the window parameters largely influence the interpolation quality. This method was extended to other domains, e.g. the time-spatial domain (Claerbout J. F., 2004), the frequency-wavenumber domain (Gülünay N., 2003), and the curvelet domain (Naghizadeh M. et al., 2010). In addition, algorithms using sparse transform, e.g. the Fourier transform (Sacchi M. D. et al., 1998; Liu Bin et al., 2004), the curvelet transform (Herrmann F. J. et al., 2008), and dictionary learning (Liang Jingwei et al., 2014; Yu et al., 2015) have also become very popular. These methods perform well in certain cases, but often suffer from the problem of precision and efficiency.

Hinton G. E. et al., (2006) presented the concept of deep learning. He pointed out that the neural networks could extract and learn the features of the data. Lecun Y. et al., (1998) promoted the Convolutional Neural Network (CNN). CNN possesses the properties of local connection and weight sharing, which helps to reduce the number of parameters and improve training efficiency. Yu F. et al. (2015(a)) introduced the idea of dilated convolution to enlarge the network receptive field and improve performance. Krizhevsky A. et al., (2012) promoted the deep neural network which achieved great success in image processing (Zeiler M. D. et al., 2014). However, deep networks suffer from the famous vanishing gradient problem (Bengio Y. et al., 1994; Glorot X. et al., 2010). Deeper networks could not obtain higher precision. To solve the problem, He Kaiming et al. (2016) presented that one can learn residual mapping through a set of residual units instead of directly learning the required mapping. Zhang Kai et al. (2017) promoted a DnCNN network consisting of convolutional layers to perform image denoising. He adopted the concept of residual learning, but employed a single residual unit to learn the residual of noisy and clean images. The strategy of residual learning does not introduce extra parameters, but can largely boost the denoising performance.

In our study, we propose a 7-layer CNN architecture to perform seismic data denoising. The network only consists of convolutional layers. We combine the strategies of batch normalization, residual learning, and dilated convolution to improve the denoising result. The performance of these algorithms is validated by synthetic tests. In comparison with conventional denoising methods such as wavelet and curvelet methods, our deep learning method could handle data with various SNR adaptively while achieving better denoising performance. For data with noise whose frequency band is similar to signals, our method could also reasonably recover signals. On the basis of the CNN denoising method, we further adopt an iterative alternating strategy to perform denoising and interpolation simultaneously. The original problem is split into two optimization sub-problems by an iterative alternating method (Bauschke H.H. et al., 2006). The reconstructed records can then be obtained by solving a least-square problem and performing a set of pre-trained CNN denoising models iteratively and alternatingly.

1 METHOD 1.1 Seismic Data Denoising Based on Deep Learning

In the Convolutional Neural Network, convolutional layers are used to extract the features of input. A set of convolution filters slide and perform convolution operations on the input data. The output after convolutions is called the feature map for the current layer. Deeper convolutional layers can extract more complex features. Utilizing given inputs and labels to train the network, the weights of convolution filters can be updated iteratively through backward propagation and the chain rule (Rumelhart D.E. et al., 1986). Therefore, the predicted output of network gradually fit the true data (label) through the training process.

Considering a CNN architecture which only consists of convolutional layers, the output of the convolutional layer l is expressed as:

$ {a^l} = \sigma ({z^l}) = \sigma ({a^{l - 1}} * {W^l} + {b^l}) $ (1)

where al-1 is the input of layer l and the output of layer l-1. Wl denotes the weights of the convolution filters for layer l, bl is the bias of layer l, zl=al-1Wl+bl is the output before using activation functions.σ is the activation function, which could introduce nonlinearity for the network. We utilize Rectifier Linear Unit (ReLU) as the activation function in our study, which can be expressed as σ(x)=max(0, x). The L2 Norm residual is adopted as the loss function. For the last layer L, we have:

$ J\left({W, b, y} \right) = \frac{1}{2}\left\| {{a^L} - y} \right\|_2^2 = \frac{1}{2}\left\| {\sigma ({a^{L - 1}} * {W^L} + {b^L}) - y} \right\|_2^2 $ (2)

The gradients of W and b can be derived utilizing the chain rule:

$ \frac{{\partial J\left({W, b, y} \right)}}{{\partial {W^l}}} = \frac{{\partial J\left({W, b, y} \right)}}{{\partial {z^l}}}\frac{{\partial {z^l}}}{{\partial {W^l}}} = {a^{l - 1}} * (\frac{{\partial J\left({W, b, y} \right)}}{{\partial {z^l}}}\frac{{\partial {z^L}}}{{\partial {z^{L - 1}}}}\frac{{\partial {z^{L - 1}}}}{{\partial {z^{L - 2}}}} \cdots \frac{{\partial {z^{l + 1}}}}{{\partial {z^l}}}) $ (3)
$ \frac{{\partial J\left({W, b, y} \right)}}{{\partial {b^l}}} = \frac{{\partial J\left({W, b, y} \right)}}{{\partial {z^l}}}\frac{{\partial {z^l}}}{{\partial {b^l}}} = \frac{{\partial J\left({W, b, y} \right)}}{{\partial {z^l}}}\frac{{\partial {z^L}}}{{\partial {z^{L - 1}}}}\frac{{\partial {z^{L - 1}}}}{{\partial {z^{L - 2}}}} \cdots \frac{{\partial {z^{l + 1}}}}{{\partial {z^l}}} $ (4)
$ \frac{{\partial {z^{{\rm{l}} + 1}}}}{{\partial {z^{\rm{l}}}}} = \frac{{\partial (\sigma ({z^l}) * {W^{l + 1}} + {b^{l + 1}})}}{{{z^{\rm{l}}}}} = {({W^{l + 1}})^T} \cdot \sigma '({z^l}) $ (5)

where (Wl+1)T is actually a 180° rotated filter of Wl+1. Notice that in equations (3) and (4), the derivative of the loss function with respect to the output of the last layer zL can be calculated directly:

$ \frac{{\partial J\left({W, b, y} \right)}}{{\partial {z^L}}} = ({a^L} - y) \cdot \sigma ' ({z^L}) $ (6)

Therefore, starting from layer L, we can derive the gradients of W and b of each layer by equations (3)-(5) reversely. Then the network parameters (W and b) can be updated by applying optimization algorithms.

The convolutional neural network is widely applied to classify images. The predicted outputs are the probabilities of different classes. However, for seismic data denoising, we adopt an end-to-end network, which means the size of output is the same with input (Fig. 1). Zero padding is applied before convolutions to keep the size of data in the middle layers. We utilize the noisy records and original noise-free records to train the network. By updating the network parameters using the algorithm above, we can finally obtain a CNN denoising model which could gradually separate signal structures from the noisy observations.

Fig. 1 Network architecture for denoising task BN denotes batch normalization, ReLU is the rectified linear unit, n-Dconv represents n-dilated convolution

We adopt the network architecture shown in Fig. 1. Layers 1-6 are convolutional layers with the ReLu activation function. The last convolutional layer is used to reconstruct the output. We utilize 64 dilated convolution filters of size 3×3 in each layer. The dilation factors are set to be 1, 2, 3, 4, 3, 2, 1 and the batch normalization (Ioffe S. et al., 2015) and residual learning strategy (Zhang Kai et al., 2017) are further applied in the network training process.

1.1.1 Dilated Convolution

Due to convolution operations, a single unit in the feature map is only related to a small zone in the input of this layer (blue block in Fig. 2(a)). Through forward propagation, one unit in the network output could only perceive the information of a particular region from the input sample. The size of this region is called the receptive field. Enlarging receptive field is crucial for network training. If a network has l convolutional layers and the convolution filter size is n×n, the receptive field would be (n-1)×l+1. There are two ways to enlarge the receptive field. One way is enlarging the size of convolution filters, the other is increasing the depth of network architecture. Both of them will introduce more parameters and increase training time. To balance the network depth and size of the receptive field, we apply dilated convolution (Yu F. et al., 2015a) in the convolution layers. Convolution filters of s-dilate are actually sparse filters of size (2s+1)×(2s+1), in which only 9 units have non-zero values (Fig. 2(b) and Fig. 2(c)). Fig. 2 shows a simple 3-layer convolutional network in which the dilation factors are set to be 1, 2 and 4. The receptive field increases exponentially by the application of dilated convolutions (blue boxes in Fig. 2).

Fig. 2 Illustration of a 3-layer dilated convolutional network (a) Layer 1: Conventional 3×3 convolution, equals to 1-dilated convolution; (b) Layer 2: 2-dilated 3×3 convolution; (c) Layer 3: 4-dilated 3×3 convolution. Yellow points denote the units performed convolution operations with the dilated convolution filters. Boxes of different colors show the contributions of each layer to the receptive field

Considering the network shown in Fig. 1, we find that the receptive fields of the 7 layers are: 3, 5, 7, 9, 7, 5, and 3. Thus the receptive field of the proposed network is 33×33. If we use a 7-layer DnCNN net with conventional 3×3 convolution filters (Zhang Kai et al., 2017), the receptive field would be 15×15. Meanwhile, the network depth has to be 16 to obtain a 33×33 receptive field. As a result, dilated convolution could largely reduce the number of network parameters for the same receptive field and spare plenty of training time. Because the size of training samples should be larger than the receptive field, we utilize samples of size 40×40. Small samples could provide more boundary information for the training process.

1.1.2 Batch Normalization

The neural network is very sensitive to data distribution, while the convolution operations could change the distribution and influence the training process. Thus we apply the batch normalization method (Ioffe S. et al., 2015) to keep the distribution in layer 2-6. The batch normalization method normalizes the mean values and standard deviations for each data batch in hidden layers. The algorithm enjoys several merits, such as fast training, better performance, and low sensitivity to initialization.

1.1.3 Residual Learning

Because the input of noisy data is quite similar to the clean data, if we learn from the clean data directly, the network will perform like an identity mapping as it becomes deeper. This could lead to the performance degradation problem (He Kaiming et al., 2016). Therefore we adopt the residual learning strategy proposed by Zhang Kai et al. (2017) and learn the residual (the difference between the noisy data and latent clean data) instead. That is to say, that the network output is the predicted value of noise. Residual learning can speed up and stabilize the training process. Moreover, residual learning and batch normalization could benefit from each other for denoising task (Zhang Kai et al., 2017).

1.2 Seismic Records Restoration Based on the CNN Denoising Models

We adopt an iterative alternating strategy to solve the data restoration problem. The objective function of data restoration could be separated and turned into a set of denoising sub-problems and least-square problems. The reconstruction result can be generated by applying pre-trained CNN denoising models. Through this strategy, simultaneous denoising and interpolation of seismic data is available.

The seismic record y is shown as:

$ y = Gx + d $ (7)

where G is a degradation matrix, x is the original seismic record, and d is additive noise. If G is an identity matrix, the data restoring problem becomes a denoising problem. For interpolation tasks, y denotes data with missing traces, and G is a sampling matrix with only 0 and 1 values (an example is shown in Fig. 6(c)). The data restoration problem could be expressed as an optimization problem:

$ \hat x = {\rm{arg}}\mathop {\min }\limits_x \frac{1}{2}{\left\| {y - Gx} \right\|^2} + \lambda \phi (x) $ (8)
Fig. 6 A post-stack field dataset (a) Original shot record; (b) Shot record regularly deleted 50% traces, white lines represent missing traces; (c) Degradation matrix of data reconstruction task, white points represent 0 and black points represent 1

in which λ is a trade-off parameter between the fidelity term $\frac{1}{2}{\left\| {y - Gx} \right\|^2}$ and the regularization term ϕ(x). Introducing an auxiliary variable p=x, equation (8) could be reformulated as a constrained optimization problem:

$ \hat x = {\rm{arg}}\mathop {\min }\limits_x \frac{1}{2}{\left\| {y - Gx} \right\|^2} + \lambda \phi (p), \;\;\;s.t.p = x $ (9)

Thus, the objective function is given by:

$ l\left({x, p} \right) = \frac{1}{2}{\left\| {y - Gx} \right\|^2} + {\lambda _1}\phi \left(p \right) + \frac{{{\lambda _2}}}{2}{\left\| {p - x} \right\|^2} $ (10)

whereλ1 and λ2 are weighting factors. Adopting an iterative alternating method (Bauschke H. H. et al., 2006), equation (10) could be split into two individual sub-problems which can be solved alternatively and iteratively:

$ {x_{k + 1}} = {\rm{arg}}\mathop {\min }\limits_x {\left\| {y - Gx} \right\|^2} + {\lambda _2}{\left\| {x - {p_k}} \right\|^2} $ (11)
$ {p_{k + 1}} = {\rm{arg}}\mathop {\min }\limits_p \frac{{{\lambda _2}}}{2}{\left\| {p - {x_{k + 1}}} \right\|^2} + {\lambda _1}\phi \left(p \right) $ (12)

We can directly solve equation (11) by the least square method:

$ {x_{k + 1}} = {({G^T}G + {\lambda _2}I)^{ - 1}}({G^T}y + {\lambda _2}{p_k}) $ (13)

while equation (12) can be rewritten as:

$ {p_{k + 1}} = {\rm{arg}}\mathop {\min }\limits_p \frac{1}{{\sqrt {{{({\lambda _1}/{\lambda _2})}^2}} }}{\left\| {{x_{k + 1}} - p} \right\|^2} + \phi \left(p \right) $ (14)

We found that equation (14) is actually a denoising problem in which the noise level is $\sqrt {{\lambda _1}/{\lambda _2}} $. As a result, the original problem turns into a set of denoising sub-problems which can be solved by pre-trained CNN denoising models.

In our study, we utilize the Shepard interpolation result as the initial data of the first iteration. In each iteration, equation (13) and (14) are solved alternatively. To solve equation (14), we train a set of CNN denoising models of uniformly-spaced noise levels. The denoising tasks are performed from high-noise-level models to low-noise-level models through the iterations. We adopt fixed λ1, and the value is determined by the highest noise level of data. λ2 changes with the noise level of the CNN denoising model used in the current iteration. We find that after 30 iterations, a satisfying performance can be obtained.

2 SYNTHETIC AND FIELD DATA TESTS

We perform synthetic and field data tests to validate the proposed CNN denoising network and the simultaneous denoising and interpolation method. The training sample dataset consists of synthetic data generated by the 2-D finite difference method using the Marmousi model, the Foothill model, and layered models. Training patches from the layered model are adopted to help improve the denoising performance of linear events, while patches from the other two models could improve the denoising performance of curved and complicated data. We perform the following tests by using this dataset, the training patches from different models are mixed and randomly distributed in each training batch. The quality of data is evaluated by the signal to noise ratio (SNR):

$ {\rm{SNR}}\left({dB} \right) = 10{\rm{lo}}{{\rm{g}}_{10}}\left({\frac{{\left\| I \right\|{\;^2}}}{{{{\left\| {{I_b} - I} \right\|}^2}}}} \right) $ (15)

where I is the original records, and Ib is the noisy records.

2.1 The CNN Denoising Tests

We evaluate the performances of batch normalization, residual learning, and different optimization solvers. The training samples are synthetic records with Gaussian noise (SNR=3dB). Utilizing SGD solver (Robbins H. et al., 1951) and Adam solver (Kingma D. P. et al., 2015), we perform the training with/without batch normalization, residual learning, and dilated convolution. SGD solver is a stochastic gradient descent method which uses part of data to update the parameters in each iteration. Thus the solver is fast and could partly avert trapping into local minimums. Adam solver calculates the first-order and second-order moment estimations to set the step lengths of different parameters.

For each case, we utilize 15, 000 samples of size 40×40 and train 50 iterations. The misfit curves of training are shown in Fig. 3, where BN represents batch normalization, RL denotes residual learning, Dilate-Conv is dilated convolution. Both Adam and SGD solvers could obtain satisfying training results utilizing batch normalization and residual learning (blue line in Fig. 3). Without the two algorithms, the SGD solver does not converge while Adam solver still performs well (purple line in Fig. 3). Batch normalization accelerates the training process and boosts the performance (red line in Fig. 3), while residual leaning could highly promote training stability (yellow line in Fig. 3). The application of dilated convolutions also contributes to better convergence (green line in Fig. 3). Therefore, we adopt the Adam solver in the following tests.

Fig. 3 The influence of batch normalization, different optimization methods, residual learning and dilated convolution on network training (a) Adam solver; (b) SGD solver

To validate the denoising performance of our network, we perform a test using a noise-free seismic gather extracted from a field dataset. Gaussian noise (Fig. 4(a) and 4(b)) and frequency-band-limited noise (Fig. 4(a) and 4(c)) are added respectively into the original data. Fig. 4(b) and Fig. 4(c) show the results of wavelet denoising, curvelet denoising, and our CNN denoising for the two types of noise. After plenty of trials, we choose Daubechies-10 wavelet and apply hard threshold processing in wavelet denoising. The training samples of the CNN denoising method are seismic records containing the corresponding type of noise. Fig. 4 shows that wavelet denoising could suppress most of the Gaussian noise, but the denoised events suffer from a distortion problem. Moreover, the wavelet method could hardly remove frequency-band-limited noise, which widely exists in field data. The Curvelet method removes Gaussian noise and preserves the edge information more effectively, but it could only suppress part of frequency-band-limited noise. Our CNN denoising method preserves more subtle information of the signals than the two methods (Fig. 4(b)) and could recover the signals effectively from band-limited noise.

Fig. 4 The denoising performance of different methods (a) Original data, frequency distribution of original data, frequency distribution of Gaussian white noise, frequency distribution of band-limited noise; (b) Data with Gaussian white noise (SNR=3.77 dB), Wavelet denoising (SNR=14.32 dB), Curvelet denoising(SNR=20.22 dB), Deep learning denoising (SNR=26.75 dB); (c) Data with band limited noise (SNR=-0.69 dB), Wavelet denoising(SNR=1.62 dB), Curvelet denoising(SNR=4.43 dB), Deep learning denoising(SNR=21.03 dB)

Then we perform a synthetic test to evaluate the influence of noise levels on our method. We select 10 different noise levels and generate 10 training datasets accordingly. In each dataset, 15, 000 training samples of size 40×40 with one noise level are generated. Then we obtain 10 CNN denoising models via training 50 iterations using the 10 datasets. These models are able to help remove noise of the corresponding noise level. We call them single-noise-level models. Furthermore, we train a new CNN denoising model utilizing 30, 000 samples of size 40×40 in which the noise levels of data are in the range of 1-10. We call it the noise-range model. We select 20 shots and generate 10 testing data patches. Each patch contains data with one noise level. The black line in Fig. 5 shows the SNR of the 10 testing patches. The dark blue line denotes the denoising result for the 10 testing patches using the corresponding single-noise-level model. Meanwhile, we process the 10 testing data patches utilizing the noise-range model. The result is displayed by the red line in Fig. 5. The green line and light blue line denotes the results of wavelet denoising and curvelet denoising. Comparing to the two traditional methods, the CNN denoising method could achieve higher SNR. Moreover, the noise-range model could handle data with different SNR adaptively. The denoising performance is comparable with the corresponding single-noise-level models.

Fig. 5 The denoising result of different models and methods The black line denotes the SNR of data containing different level of noise. The dark blue line denotes the SNR of denoising result by single noise level models, while the red line represents the SNR of denoising result by noise range model. The green line shows the result of Wavelet denoising, while the light blue line is the result of Curvelet denoising
2.2 Interpolation and Denoising by Iterative Alternating Strategy

First, we perform an interpolation test for noise-free data. Before applying the iterative alternating strategy, we test the result of directly training CNN interpolation models. The network that we use is shown in Fig. 1. The training samples are seismic records with 50% missing data. We perform the Shepard interpolation before feeding the samples into the net. The output to learn is the residual of input data and original data. Fig. 6(a) shows the original post-stack data from a field dataset. We regularly delete 50% traces (Fig. 6(b)) and then apply Bicubic interpolation, Fourier transform interpolation, and our CNN interpolation model to reconstruct the data. The reconstruction results and residuals are shown in Fig. 7. The CNN interpolation model could obtain data with higher SNR and smaller residuals. However, for noisy data and data with irregular missing traces, a large number of samples are required to perform training.

Fig. 7 The result of data reconstruction and residuals Color bar shows the absolute value of residual. (a) (b) Bicubic interpolation, SNR=41.42dB; (c) (d) Fourier transform interpolation, SNR=47.78dB; (e)(f)Deep residual learning method, SNR=53.14dB; (g)(h)Alternate optimization method, SNR=57.82dB

To apply the proposed iterative alternating strategy, we train a set of CNN denoising models using the network in Fig. 1. The trained CNN denoising models are in the range of -15dB-30dB with an interval of 5dB. To reconstruct the data displayed in Fig. 6(b), we adopt the iterative alternating method, and solve equation (13) and equation (14) alternatively and iteratively. For denoising sub-problems, we use pre-trained CNN denoising models from -15dB- 30dB. We set to be in our tests. Note that our method is based on denoising models, and is therefore calculated by iterations, denoting the highest noise level, which is the noise level of the current iteration. After 30 iterations, the interpolated results and residuals are shown in Fig. 7(g) and 7(h). In comparison with other methods, the result of the iterative alternating method shows smaller residuals. Moreover, the proposed method could recover the edge information of discontinuous records with more stability, which is highly beneficial for the fidelity of seismic data.

Then, we perform a test to construct the records from noisy data with missing traces. The original data is displayed in Fig. 6(a). As shown in Fig. 8(a), random noise of SNR=16.8 dB is added in the left part of the traces, while the noise of SNR=3.17 dB is added in the right part. Moreover, we randomly delete 10% traces from the noisy data. Applying the iterative alternating method, we perform 30 iterations from -15 dB model to 0 dB model. The reconstructed data is shown in Fig. 8(b). The result shows that noise with different SNR is suppressed effectively, and the SNR of reconstructed data is largely improved.

Fig. 8 (a) Records added with different level of noise in the left part (SNR=16.80 dB) and right part (SNR=3.17 dB), and irregularly deleted 10% traces (white lines); (b) Result of simultaneous denoising and interpolation using iterative alternating method (SNR=31.64 dB)

Furthermore, the proposed method is applied to a 3-D pre-stack field dataset. Fig. 9(a) shows an original shot gathered with 30% missing traces. The far-offset traces suffer from stronger random noise. Performing the iterative alternating method, we obtain the reconstruction displayed in Fig. 9(b). The random noise is removed effectively while the signal structures are preserved. The quality of records is highly improved. Through the simultaneous denoising and interpolation method, the efficiency of seismic data processing is highly improved.

Fig. 9 The result of data reconstruction and denoising for a pre-stack dataset (a) Noisy record with 30% missing traces; (b)Result of simultaneous denoising and interpolation using iterative alternating method
3 DISCUSSION AND CONCLUSIONS

In this study, we propose an adaptively denoising method based on the deep learning algorithm to perform seismic data denoising. Compared to traditional methods, the proposed method shows two main advantages: First, traditional methods usually rely on noise level estimations. Inaccurate estimations will lead to either the damage of signals or the remains of noise. Moreover, in field data cases, the SNR of seismic records usually varies with offset and recording time. At this point, traditional methods could not remove noise while preserving valid signals. However, our method can handle data with various SNR adaptively and intelligently after one training process. Second, traditional methods could not remove noise with similar frequency bands to signals, while our method can suppress it effectively. The proposed method has the potential to remove other types of noise, such as surface waves, linear noise, and multiplies the use of adequate training samples.

Training samples with various features could contribute to better denoising performance. We utilize training patches from the layered model to improve the denoising performance of linear events, and apply patches from the other two models to handle curved and complicated data. After training with data generated from the three models, the CNN model could achieve satisfying denoising performance in new datasets with random noise. However, if the field data suffers from several different kinds of coherent noise which are intermixed with random noise, the CNN method would fail. Obtaining sufficient training samples which contain similar coherent and random noise to complicated field datasets is still challenging, and it may be solved by the transfer learning method in the future work.

For noisy records with missing traces, we adopt an iterative alternating optimization strategy and separate the objective function of the data restoration problem into two sub-problems. Then the data reconstruction could be applied utilizing the pre-trained denoising models. The reconstruction result of the proposed method is superior to those of conventional methods and directly training CNN interpolation models. For discontinuous record and irregularly sampled data, the proposed method could recover the edge and subtle information effectively. Based on the iterative alternating strategy, the denoising and interpolation problems can be solved simultaneously. That could spare mounts of training and processing time. We perform synthetic and field data tests to validate the performance of the proposed CNN denoising network and the simultaneous denoising and interpolation method.

ACKNOWLEDGEMENTS

We show gratefully acknowledgement to the Supercomputing Center of USTC for their computing assistance. We appreciate the financial support of the National Natural Science Foundation of China (Grant No. 41674120).

REFERENCES
Bauschke H.H., Combettes P.L., Noll D. Joint minimization with alternating Bregman proximity operators[J]. Pacific Journal of Optimization, 2006, 2(3): 401–424.
Beckouche S., Ma Jianwei. Simultaneous dictionary learning and denoising for seismic data[J]. Geophysics, 2014, 79(3): A27–A31. DOI:10.1190/geo2013-0382.1.
Bengio Y., Simard P., Frasconi P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on Neural Networks, 1994, 5(2): 157–166.
Bonar D., Sacchi M. Denoising seismic data using the nonlocal means algorithm[J]. Geophysics, 2012, 77(1): A5–A8.
Canales L.L. Random noise reduction. In: Proceedings of 1984 SEG Annual Meeting[C]. Atlanta, Georgia: Society of Exploration Geophysicists, 1984.
Candès E.J., Donoho D. L. Curvelets: a Surprisingly Effective Nonadaptive Representation of Objects with Edges[R]. Stanford: Stanford University, 2000
Chanerley A.A., Alexander N.A. An approach to seismic correction which includes wavelet de-noising. In: Proceedings of the 6th Conference on Computational Structures Technology[C]. September: ACM, 2002. 107-108.
Chen Yangkang, Huang Weilin, Zhang Dong, Chen Wei. An open-source Matlab code package for improved rank-reduction 3D seismic data denoising and reconstruction[J]. Computers & Geosciences, 2016, 95: 59–66.
Claerbout J.F. Earth Soundings Analysis: Processing Versus Inversion[D]. Stanford: Stanford University, 2004.
Glorot X., Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics[C]. Sardinia, Italy: JMLR, 2010: 249-256.
Gülünay N. Seismic trace interpolation in the Fourier transform domain[J]. Geophysics, 2003, 68(1): 355–369.
He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition[C]. Las Vegas, NV, USA: IEEE, 2016. 770-778.
Hennenfent G., Herrmann F.J. Seismic denoising with nonuniformly sampled curvelets[J]. Computing in Science & Engineering, 2006, 8(3): 16–25.
Herrmann F.J., Hennenfent G. Non-parametric seismic data recovery with curvelet frames[J]. Geophysical Journal International, 2008, 173(1): 233–248. DOI:10.1111/gji.2008.173.issue-1.
Hinton G.E., Osindero S., Teh Y.W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527–1554. DOI:10.1162/neco.2006.18.7.1527.
Ioffe S., Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning[C]. Lille, France: ACM, 2015. 448-456.
Kingma D.P., Ba J.L. Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations. Ithaca: Amsterdam Machine Learning Lab, 2015. 1-13.
Krizhevsky A., Sutskever I., Hinton G.E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems[C]. Lake Tahoe, Nevad: ACM, 2012. 1097-1105.
LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. DOI:10.1109/5.726791.
Liang Jingwei, Ma Jianwei, Zhang Xiaoqun. Seismic data restoration via data-driven tight frame[J]. Geophysics, 2014, 79(3): V65–V74. DOI:10.1190/geo2013-0252.1.
Liu Bin, Sacchi M.D. Minimum weighted norm interpolation of seismic records[J]. Geophysics, 2004, 69(6): 1560–1568. DOI:10.1190/1.1836829.
Mao Xiaojiao, Shen Chunhua, Yang Yubin. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In: Proceedings of the 29th Conference on Neural Information Processing Systems. Barcelona[C], Spain: NIPS, 2016. 2802-2810.
Naghizadeh M., Sacchi M.D. Beyond alias hierarchical scale curvelet interpolation of regularly and irregularly sampled seismic data[J]. Geophysics, 2010, 75(6): WB189–WB202. DOI:10.1190/1.3509468.
Robbins H., Monro S. A stochastic approximation method[J]. The Annals of Mathematical Statistics, 1951, 22(3): 400–407.
Rumelhart D.E., Hinton G.E., Williams R.J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533. DOI:10.1038/323533a0.
Sacchi M.D., Ulrych T.J., Walker C.J. Interpolation and extrapolation using a high-resolution discrete Fourier transform[J]. IEEE Transactions on Signal Processing, 1998, 46(1): 31–38. DOI:10.1109/78.651165.
Spitz S. Seismic trace interpolation in the F-X domain[J]. Geophysics, 1991, 56(6): 785–794. DOI:10.1190/1.1443096.
Tang Gang, Ma Jianwei, Yang Huizhu. Seismic data denoising based on learning-type overcomplete dictionaries[J]. Applied Geophysics, 2012, 9(1): 27–32.
Turner G. Aliasing in the Tau-p transform and the removal of spatially aliased coherent noise[J]. Geophysics, 1990, 55(11): 1496–1503. DOI:10.1190/1.1442797.
Yu F., Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv: 1511.07122, 2015.
Yu Siwei, Ma Jianwei, Zhang Xiaoqun, Sacchi M.D. Interpolation and denoising of high-dimensional seismic data by learning a tight frame[J]. Geophysics, 2015b, 80(5): V119–V132. DOI:10.1190/geo2014-0396.1.
Zeiler M.D., Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of the 13th European Conference on Computer Vision.[C] Zurich, Switzerland: Springer, 2014. 818-833.
Zhang Kai, Zuo Wangmeng, Chen Yunjin, Meng Deyu, Zhang Lei. Beyond a Gaussian denoiser:residual learning of deep CNN for image denoising[J]. IEEE Transactions on Image Processing, 2017, 26(7): 3142–3155. DOI:10.1109/TIP.2017.2662206.