Page 93 - Read Online
P. 93

Page 86                 Salmani et al. J Surveill Secur Saf 2020;1:79–101  I http://dx.doi.org/10.20517/jsss.2020.16


               Algorithm 1 Encryption
                 1: procedure Encryption( D,K,∆ d , φ)
                        ˆ
                 2:    ∆ d = BuildChain(K,∆ d , φ)
                          0
                 3:    for all D i in D do
                 4:        while !eo f do
                 5:            w = readNextWord(D i)
                 6:            if isKeyword(w) then
                                                                                  ˆ
                 7:                ˆ w = select randomly an encrypted ciphertext for w from ∆ d  0
                 8:            else
                 9:                ˆ w = encrypt w with secret key K
                10:            end if
                11:            C i + = ˆw
                12:        end while
                13:        Add(C,C i )
                14:    end for
                15:    return C
                16: end procedure



                  Since the cloud sees the encrypted document collection C, it generates the DTM matrix (γ ) based on the
                                                                                             0
                  encrypted keywords in C. Thus, the number of columns in γ is d rather than d (d ≤ d ) and the high
                                                                          0
                                                                      0
                                                                                             0
                  frequency keywords are eliminated in the whole matrix.
                • GenerateQuery. In the initialization phase, the data owner and the data user exchange φ vector and the
                  secret key K that enable the data user to make an encrypted query q. In addition, because we have multiple
                  encryptedversions(ciphertexts)ofeachkeyword, thedatausercanuseaportionoftheavailableciphertexts
                  for each keyword, but the data user must use the same portion for all of the keywords in the same query to
                  not effect the results. For example, the data user may decide to use sixty percent of available ciphertext for
                  each keyword, but he cannot employ sixty percent for the first keyword and forty percent for the second
                  one, because it makes the results imprecise. Finally, encrypted query q is sent to the cloud. The data user
                  may send an optional parameter k to the cloud to retrieve only the top-k resultant documents.
                  This is one of the characteristics that distinguishes our approach from other schemes. With each query
                  the data user is able to randomly choose some of the ciphertext for each keyword which delivers more
                  uncertaintyandconsequentlymoreentropy. Thus, evenifconsecutivequeriessharesomeoftheirkeywords,
                  the cloud is not able to find a pattern between the queries due to using different versions of ciphertext in
                  eachquery. Moreover, co-occurringtermsappearwithdifferentciphertextintheencryptedfiles, so, finding
                  the co-occurring terms becomes significantly more difficult for the cloud.
                  The details of Query is shown in Algorithm 2. The data user declares the “portion” manually or it can
                  be determined randomly by the algorithm (like we did in the Algorithm 2). This feature determines the
                  percentage of each keyword ciphertext that will be employed in the query encryption process. For example,
                  if w i possesses five different ciphertext and the portion is set to sixty percent, the algorithm employs three
                  versions of the ciphertexts randomly for the current query. Moreover, the data user is able to generate the
                  ciphertexts as all encrypted versions are chained together. Note that the plain query can be indicated by
                  today’s web search engine such as Bing® and Google®, in which the data users tend to provide a sentence in
                  natural languages or a set of keywords to express their intentions. In this case, we first extract the keywords
                  ∆ q from the plain query.

                • Search. Before explaining the LRSE search algorithm we define the document and query vector and the
   88   89   90   91   92   93   94   95   96   97   98