Page 98 - Read Online
P. 98

Salmani et al. J Surveill Secur Saf 2020;1:79–101  I http://dx.doi.org/10.20517/jsss.2020.16  Page 91


               Proof. Consider:
                                                             d
                                                              0
                                                     d
                                          LRSE : [0,1] 7→ [0,1] such that  d ≥ d                       (1)
                                                                           0
               and
                                                       d         0       d  0
                                               T i ∈ [0,1]  and  T i ∈ [0,1]
               .


                                                            ) is:
               The entropy of the document vector T i = ( f w 1 ,. . ., f w d
                                                           b
                                                          ∑
                                                 H(T i ) = −  f w j  × log( f w j  )
                                                          j=1

                                         0
               and the entropy of the T i = ( f ,. . ., f  0  ) is:
                                   0
                                        w 1    w d 0
                                                          d  0
                                                         ∑
                                                   0         0        0
                                                H(T i ) = −  d w j  × log(d w j  )
                                                          j=1
               Recall φ = (l 1 ,. . .,l d ), with  ∑ d  l  0  ∈ T i we have:
                                        ℓ=1 ℓ = d , and for any f w j
                                                  l j                   j−1
                                                 ∑                      ∑
                                                =    f  0             =    l ℓ .
                                             f w j         , where ,α w j
                                                     w (k+α w j  )
                                                  k=1                   ℓ=1
               Hence:

                                               ) = −( f  0  + · · · + f  0  ) log( f  0  + · · · + f  0  )
                                    − f w j  log( f w j
                                                      w 1       w m     w 1        w m
                                                m
                                               ∑
                                                    0
                                              =    f .                                                 (2)
                                    where, f w j
                                                   w k
                                                k=1
               Moreover, note that log(x) is a monotonically increasing function and T i possesses positive values (based on
                                                                           0
               ( 1)), thus we have:
                                      − ( f  0  ) × log( f  0  + · · · + f  0  ) ≤ −( f  0  ) × log( f  0  )
                                          w 1       w 1       w m      w 1       w 1
                                                  m
                                                 ∑
                                               =     f  0                                              (3)
                                     where, f w j
                                                     w k
                                                 k=1
               By extending the above inequality for all of the keyword frequencies in T i and T :
                                                                                 0
                                                                                 i
                                            d                  d 0
                                           ∑                  ∑
                                          −    f w j  × log( f w j  ) ≤ −  f  0  × log( f  0  )
                                                                   w m      w m
                                            j=1               m=1
               Thus we have:

                                                         0
                                                      H(T i ) ≥ H(T i )
   93   94   95   96   97   98   99   100   101   102   103