Page 33 - Read Online
P. 33

Page 2 of 10                    Bao et al. Complex Eng Syst 2022;2:16  I http://dx.doi.org/10.20517/ces.2022.30



               1. INTRODUCTION
               The search for pulsars is of great importance in the study of astronomy, physics and other fields, including
               gravitational waves, state equations of dense substances, stellar evolution, dark matter and dark energy, and
               the formation and evolution of binary and multiple star systems. Therefore, the discovery of new pulsars and
               the exploration of their substantial scientific research potential are of great value and importance.


                                                                                 [1]
               At present, more than 2,700 pulsars have been discovered in the whole galaxy . Most of the pulsars were
               discovered by modern radio telescopes, which receive periodic radio signals, pre-process them and package
               themintothedataweneed. Thegenerationofpulsarcandidatesfromthecollecteddataisbasicallydividedinto
                                                                                            [2]
               three procedures which are eliminating RFI, de-dispersion, and fast fourier transform (FFT) . This strategy
               is generally how the samples of pulsar candidates are generated. With the continuous improvement of modern
               radio telescopes, samples of pulsar candidates have increased, but only a small fraction of these samples are
               real pulsars due to the presence of RF interference and different noise sources. As a result, the real sample of
               pulsar candidates is much smaller than the non-real sample. In traditional studies, manual experts review each
                                [3]
               candidate in 1-300 s , and it takes more than 70,000 h to examine millions of pulsar candidates. Therefore, it
               is crucial to investigate an automatic, efficient and accurate method for pulsar candidate identification.

                                                                                                        [4]
               In recent years, a large number of object detection methods based on neural networks have been proposed ,
               many of which have been selected for pulsar candidate detection. Pulsar candidate identification methods
               based on neural networks have been proposed to handle the huge amount of pulsar data. Bates et al. [5]  used
               artificial neural networks to automatically identify plausible pulsar candidates from pulsar measurements.
               Morello et al. [6]  proposed a method called SPINN (Straightforward Pulsar Identification using Neural Net-
               works), which designed a pulsar candidate classifier that tended to maximize the recall of identification. Zhu
               et al. [7]  developed a pulsar image-based classification system (PICS) that used image pattern recognition and
               deep neural networks to identify pulsars in recent measurements, and Lyon et al. [8]  proposed the decision
               tree-based recognition model Very Fast Decision Tree (VFDT), a method that found 20 new pulsars using the
               LOTAAS dataset.

               Although the above neural network-based methods have achieved good identification results on the corre-
               sponding datasets and helped astronomers discover new pulsars, there are still some problems. Among the
               currently available pulsar candidate data, the number of positive samples (real pulsars) among the labelled pul-
               sar candidates is extremely limited, and the number of negative samples (non-real pulsars) is much higher than
               the number of positive samples. In this case, when some deep learning models are directly used for training,
               the imbalance between the number of positive and negative samples leads to poor classification, overfitting,
               and even possible training failure. To address this issue, Lyon et al. [9]  confirmed that the imbalance prob-
               lem of pulsar candidate samples reduces the recall of pulsars by executing different classifiers on the HTRU
               dataset [10] . Then Lyon et al. [11] proposed using the Hellinger distance (HDT) as a splitting criterion for VFDT,
               thus alleviating the sample imbalance problem. In addition, GAN methods have recently been widely used
               in pulsar candidate identification [12] . For example, Guo et al. [13]  proposed using Generative Adversarial Net-
               works (GAN) [14]  to generate some positive pulsar sample data to alleviate the problem of low recall for pulsar
               candidate identification models on unbalanced datasets.


               Although the above methods can alleviate the sample imbalance problem to a certain extent, the traditional
               GAN model suffers from the pattern collapse problem while generating positive samples [15] . Therefore,
               WGAN   [16] , a Wasserstein distance-based generative adversarial network, is recommended in the proposed
               method to alleviate the pattern collapse problem and enlarge the present pulsar dataset. WGAN was first uti-
               lized to generate some images that approximate the real pulsar as positive samples and then fuse the generated
               positive sample images into an unbalanced dataset to train the pulsar recognition model. Experiments proved
               that training the deep neural network model on the balanced dataset could further improve the model’s recog-
   28   29   30   31   32   33   34   35   36   37   38