Page 33 - Read Online
P. 33
Page 2 of 10 Bao et al. Complex Eng Syst 2022;2:16 I http://dx.doi.org/10.20517/ces.2022.30
1. INTRODUCTION
The search for pulsars is of great importance in the study of astronomy, physics and other fields, including
gravitational waves, state equations of dense substances, stellar evolution, dark matter and dark energy, and
the formation and evolution of binary and multiple star systems. Therefore, the discovery of new pulsars and
the exploration of their substantial scientific research potential are of great value and importance.
[1]
At present, more than 2,700 pulsars have been discovered in the whole galaxy . Most of the pulsars were
discovered by modern radio telescopes, which receive periodic radio signals, pre-process them and package
themintothedataweneed. Thegenerationofpulsarcandidatesfromthecollecteddataisbasicallydividedinto
[2]
three procedures which are eliminating RFI, de-dispersion, and fast fourier transform (FFT) . This strategy
is generally how the samples of pulsar candidates are generated. With the continuous improvement of modern
radio telescopes, samples of pulsar candidates have increased, but only a small fraction of these samples are
real pulsars due to the presence of RF interference and different noise sources. As a result, the real sample of
pulsar candidates is much smaller than the non-real sample. In traditional studies, manual experts review each
[3]
candidate in 1-300 s , and it takes more than 70,000 h to examine millions of pulsar candidates. Therefore, it
is crucial to investigate an automatic, efficient and accurate method for pulsar candidate identification.
[4]
In recent years, a large number of object detection methods based on neural networks have been proposed ,
many of which have been selected for pulsar candidate detection. Pulsar candidate identification methods
based on neural networks have been proposed to handle the huge amount of pulsar data. Bates et al. [5] used
artificial neural networks to automatically identify plausible pulsar candidates from pulsar measurements.
Morello et al. [6] proposed a method called SPINN (Straightforward Pulsar Identification using Neural Net-
works), which designed a pulsar candidate classifier that tended to maximize the recall of identification. Zhu
et al. [7] developed a pulsar image-based classification system (PICS) that used image pattern recognition and
deep neural networks to identify pulsars in recent measurements, and Lyon et al. [8] proposed the decision
tree-based recognition model Very Fast Decision Tree (VFDT), a method that found 20 new pulsars using the
LOTAAS dataset.
Although the above neural network-based methods have achieved good identification results on the corre-
sponding datasets and helped astronomers discover new pulsars, there are still some problems. Among the
currently available pulsar candidate data, the number of positive samples (real pulsars) among the labelled pul-
sar candidates is extremely limited, and the number of negative samples (non-real pulsars) is much higher than
the number of positive samples. In this case, when some deep learning models are directly used for training,
the imbalance between the number of positive and negative samples leads to poor classification, overfitting,
and even possible training failure. To address this issue, Lyon et al. [9] confirmed that the imbalance prob-
lem of pulsar candidate samples reduces the recall of pulsars by executing different classifiers on the HTRU
dataset [10] . Then Lyon et al. [11] proposed using the Hellinger distance (HDT) as a splitting criterion for VFDT,
thus alleviating the sample imbalance problem. In addition, GAN methods have recently been widely used
in pulsar candidate identification [12] . For example, Guo et al. [13] proposed using Generative Adversarial Net-
works (GAN) [14] to generate some positive pulsar sample data to alleviate the problem of low recall for pulsar
candidate identification models on unbalanced datasets.
Although the above methods can alleviate the sample imbalance problem to a certain extent, the traditional
GAN model suffers from the pattern collapse problem while generating positive samples [15] . Therefore,
WGAN [16] , a Wasserstein distance-based generative adversarial network, is recommended in the proposed
method to alleviate the pattern collapse problem and enlarge the present pulsar dataset. WGAN was first uti-
lized to generate some images that approximate the real pulsar as positive samples and then fuse the generated
positive sample images into an unbalanced dataset to train the pulsar recognition model. Experiments proved
that training the deep neural network model on the balanced dataset could further improve the model’s recog-