Page 13 - Read Online
P. 13

Page 136                          Liu et al. Intell Robot 2023;3(2):131-43  I http://dx.doi.org/10.20517/ir.2023.07

               Minimize the JMMD distance between the two dissimilar distributions.


               3.2.1. Loss-function      
               To migrate the diagnostic capability to the target task, it is first necessary to ensure that the model has learned
               enough diagnostic knowledge in the source domain data. Thus, the first loss function       of our method is to
               minimize the classification loss of fault classification on the labeled data. The required objective function      
               for data with    fault classes is the standard softmax loss function.




                                                                          
                                                                    (     )   +    
                                                1  Õ    Õ                   
                                                 = −       [      =   ] log                           (1)
                                                                Í     ( (      )   +  )  
                                                                           
                                                     =1   =1        =1      
                                                                            
               where    is the batch size and    is the number of fault classes.
               3.2.2. Loss-function    d
               The primary role of the domain adaptation module is to guide the network to extract domain invariant features
               undertheconstraintofthelossfunction. Borrowingideasfromgenerativeadversarialnetworks, anadversarial
               domain-based training approach is added to learn the domain-invariant features. By setting a gradient reverse
               layer(GRL)infrontofthedomainclassifier, thetargetdomaindataisconfoundedwiththesourcedomaindata,
               thus maximizing the classification loss between the two domains. The domain classifier and feature extractor
               struggle with each other and finally reach a balance. Thus, domain-invariant features are learned. However, if
               wejustalignthemarginaldistributionbetweentwodataandignorethecorrelationbetweenlabelsandfeatures,
               the final alignment results are poor. The conditional domain adversarial network is used to capture the cross-
               covariance between features and labels, thus improving the discrimination [22] . Considering the non-linear
               and non-smooth nature of fault signals, the joint distributions of fault features and corresponding labels need
               to be aligned as closely as possible to effectively transfer the diagnostic capability. Therefore, we train CDA as
               a second objective function here. Subsequently, the loss function       is shown below.



                                                                      −1
                                                                    Õ
                                              (  (  )) = 1 +    −   (   ) ,   (  ) = −        log        (2)
                                                                      =0
                                               1                     Õ       
                                             d = −               log 1 −          ;       ;      
                                                                         
                                                    
                                                           =1
                                                                                                        (3)
                                               1                     Õ  h               i
                                             −                log          ;       ;      
                                                                       
                                                    
                                                           =1
                where       is the model parameter corresponding to the feature extraction module,       is the parameter of
               the domain classifier, and    denotes the number of fault types,   (  ) denotes the uncertainty of the sample
               classification result, and   (  (  )) denotes the weight of each sample.
               3.2.3. Loss-function    D
               Compared with the CDA method, spatial metric distance minimization is another approach to learning do-
               main invariant features. The MMD method is used by Borgwardt et. al [23]  to measure the variability of dis-
               tributions. However, the effectiveness of aligning different distributions with MMD in complex multimodal
               conditions is limited. To address this problem, Long et al. [24]  proposes the JMMD method to de-align the
               joint distribution in the feature space and label space, where the loss function       is defined as


                                                  D = 
E              ⊗          − E T           ⊗          
 
  (4)
                                                                                 
                           
                where    and    represent the output of the fault feature, and    and    denote the vector representation
                                                                          
                                  
               of label. Unlike the standard JMMD, we add    ⊗    to align the joint distribution of two domains,    ⊗    refers
   8   9   10   11   12   13   14   15   16   17   18