Page 13 - Read Online

P. 13

Page 136 Liu et al. Intell Robot 2023;3(2):131-43 I http://dx.doi.org/10.20517/ir.2023.07

Minimize the JMMD distance between the two dissimilar distributions.

3.2.1. Loss-function
To migrate the diagnostic capability to the target task, it is first necessary to ensure that the model has learned
enough diagnostic knowledge in the source domain data. Thus, the first loss function of our method is to
minimize the classification loss of fault classification on the labeled data. The required objective function
for data with fault classes is the standard softmax loss function.

 ( ) + 
1  Õ Õ 
= −  [ = ] log  (1)
 Í ( ( ) + ) 

 =1 =1 =1 
 
where is the batch size and is the number of fault classes.
3.2.2. Loss-function d
The primary role of the domain adaptation module is to guide the network to extract domain invariant features
undertheconstraintofthelossfunction. Borrowingideasfromgenerativeadversarialnetworks, anadversarial
domain-based training approach is added to learn the domain-invariant features. By setting a gradient reverse
layer(GRL)infrontofthedomainclassifier, thetargetdomaindataisconfoundedwiththesourcedomaindata,
thus maximizing the classification loss between the two domains. The domain classifier and feature extractor
struggle with each other and finally reach a balance. Thus, domain-invariant features are learned. However, if
wejustalignthemarginaldistributionbetweentwodataandignorethecorrelationbetweenlabelsandfeatures,
the final alignment results are poor. The conditional domain adversarial network is used to capture the cross-
covariance between features and labels, thus improving the discrimination [22] . Considering the non-linear
and non-smooth nature of fault signals, the joint distributions of fault features and corresponding labels need
to be aligned as closely as possible to effectively transfer the diagnostic capability. Therefore, we train CDA as
a second objective function here. Subsequently, the loss function is shown below.

−1
Õ
( ( )) = 1 + − ( ) , ( ) = − log (2)
=0
1 Õ
d = − log 1 − ; ;

=1
(3)
1 Õ h i
− log ; ;

=1
where is the model parameter corresponding to the feature extraction module, is the parameter of
the domain classifier, and denotes the number of fault types, ( ) denotes the uncertainty of the sample
classification result, and ( ( )) denotes the weight of each sample.
3.2.3. Loss-function D
Compared with the CDA method, spatial metric distance minimization is another approach to learning do-
main invariant features. The MMD method is used by Borgwardt et. al [23] to measure the variability of dis-
tributions. However, the effectiveness of aligning different distributions with MMD in complex multimodal
conditions is limited. To address this problem, Long et al. [24] proposes the JMMD method to de-align the
joint distribution in the feature space and label space, where the loss function is defined as

D =
E ⊗ − E T ⊗

(4)

where and represent the output of the fault feature, and and denote the vector representation

of label. Unlike the standard JMMD, we add ⊗ to align the joint distribution of two domains, ⊗ refers

8 9 10 11 12 13 14 15 16 17 18