Page 94 - Read Online
P. 94

Salmani et al. J Surveill Secur Saf 2020;1:79–101  I http://dx.doi.org/10.20517/jsss.2020.16  Page 87


               Algorithm 2 Query
                 1: procedure Query(∆ q ,K, φ)
                 2:    por = randomly choose a portion
                 3:    for all w i in ∆ q do
                 4:        neededVersions=dφ i ×pore
                 5:        j = 1
                 6:        while j ≤ neededVersions do
                 7:            ˆ w = choose randomly one encrypted versions of w i
                 8:            q + = ˆw
                 9:        end while
                10:    end for
                11:    return q
                12: end procedure


                  relevance score:
                  Definition5. (DocumentVector). Let ∆ d bethedictionaryofkeywordsfromcorpusD. Adocumentvector
                                 ) is a normalized vector that demonstrates the normalized frequency of each keyword
                  T i = ( f w 1 ,. . ., f w d
                    ,1 ≤ j ≤ d in document D i. Let T = {T 1 ,. . .,T n } be a set of all document vectors.
                  f w j
                  Definition 6. (Query Vector). A query vector Q is a d-element boolean vector such that ∀ 1 ≤ i ≤ d,Q[i] =
                  1 if w i ∈ ∆ q, and 0 otherwise.


                  Definition 7. (Relevance Score). Let T i denote the normalized frequency vector of the document D i and Q
                  be the query vector from ∆ q. Their relevance score s i is inner product of document D i to query vector Q.

                  Recall that upon receiving the encrypted document collection C the cloud builds its own DTM matrix
                                                                               ˆ
                  γ and based on that it generates a set of encrypted document vectors T = {T 1 ,. . .,T n }. However, the
                                                                                          ˆ
                                                                                     ˆ
                   0
                  cloud cannot make the real γ matrix and T (see Section 4.1). Upon receiving the encrypted query q, the
                  cloud server executes Search algorithm. The LRSE Search(q,T, k) algorithm takes the encrypted query q,
                                                                      ˆ
                  encrypted document vectors T, and an optional k, and returns the top-k encrypted resultant documents
                                           ˆ
                  corresponding to R(q) to the data user. Consider that the resultant documents are ranked according to
                  their relevance score s.
                • Decryption. The data user executes Decryption(C i ,K,∆ d ) for each C i in the resultant documents. This
                                                                 ˆ
                                                                  0
                                                                                                   ˆ
                  algorithm inputs an encrypted document C i, a secret key K, and an encrypted keyword dictionary ∆ d and
                                                                                                     0,
                  outputs D i. Keywords in the resultant documents are encrypted randomly with a different encrypted piece
                  of the chain. Thus, before decrypting the results, Decryption algorithm normalizes the results by changing
                  each keyword’s ciphertext with the first ciphertext in the chain. It then decrypts the normalized encrypted
                  document using the secret key K.

               4.2  Scheme II - Index
               InSchemeIIweemployanindexstructuretoincreasethesearchspeed. InSchemeIweencrypteachdocument
               keyword by keyword, i.e., each block cipher contains one keyword (we consider each block large enough to
               store an encrypted keyword). In contrast, in Scheme II each block cipher contains 128 bits of a file (which may
               vary depend on the protocol and employed encryption scheme). Hence, the cloud learns nothing from the
               encrypted documents D and is not able to generate its own γ matrix, so must to rely on the index (provided
                                                                  0
               by the data owner) to find and rank the results.

               Furthermore, thecloudisnot abletolearn the keywordpositions in the documents. Wealso employthesecure
   89   90   91   92   93   94   95   96   97   98   99