Page 91 - Read Online
P. 91

Page 84                 Salmani et al. J Surveill Secur Saf 2020;1:79–101  I http://dx.doi.org/10.20517/jsss.2020.16



               4   LRSE SCHEME
               Tomeettheserequirements, weproposetheLRSEscheme. LRSEisaprivacypreservingmulti-keywordranked
               search mechanism over encrypted files. We adopt the secure inner product proposed by Wong et al. [14] . The
               index SI and the query q are both protected using this encryption strategy.

               The key idea in our approach is to generate a variety of ciphertexts for high frequency keywords from the
               corpus D. Hence, multiple encrypted versions of a keyword will be used to search for the same keyword. This
               directly affects the search pattern ψ that the adversary collects to discover the keyword plaintext. We propose
               two schemes: Scheme I introduces a solution for searching with sequential scan using our novel chaining
               notation; and Scheme II, expresses how the chaining notion can be employed to handle controlled searching
               with an outsourced encrypted index.

               4.1  Scheme I - sequential scan
               Recall that in Definition 1 a searchable encryption scheme contains a suite of 6 algorithms. Here, we explain
               how each algorithm works in Scheme I. In the setup phase, beside calling the KeyGen to generate the secret
               key K, we generate a document-term matrix and a required ciphertext vector. Scheme I’s details are presented
               here:

                • Setup. Let γ be a n × d document-term matrix(DTM) which an element e ij represents the frequency of
                  keyword w j in document D i. Let φ = (l 1 ,. . .,l d ) be required-ciphertext vector which l i ,1 ≤ i ≤ d shows

                  the number of required ciphertext for keyword w i. We first extracts the corpus keyword dictionary ∆ d
                  from the entire corpus D and then generates the document-term matrix γ. Afterwards, we generate the
                  the required-ciphertext vector φ (see below for more details).Then we call KeyGen algorithm to generate
                  the secret key K that will be used to encrypt the documents.

                • RequiredCiphertext. The probability of querying high frequency keywords is high, so these keywords are
                  more prone to privacy leakages. In a strong attack model [12]  where the cloud server is equipped with more
                  knowledge such as the term frequency statistics, the attacker(cloud) can extract invaluable information
                  form the encrypted files [16] . Moreover, by issuing each query to the cloud, the data user is revealing more
                  private information such as her interests and hobbies. Thus, there are two main challenges: 1) hiding the
                  keywordfrequenciesintheoutsourcedencrypteddocumentsand2)obscuringthefrequencyofthequeried
                  keywords which may lead to exposing the underlying keywords [12] .

                  In an ideal world the keyword frequencies are equal, and the data user uniformly queries all of the available
                  keywords. Thus, the cloud observation from the encrypted files and the queries does not leak any informa-
                  tion. Although this is impossible in the real world, our goal is to break down the keyword frequencies to
                  get close to the ideal scenario. For this reason we calculate the average of each column (average of each key-
                  word in the document collection D). Note that because we are employing uniform distribution the average
                  and the median are effectively the same. Let A = (a 1 ,. . .,a d ) be the average vector in which a i ,1 ≤ i ≤ d
                                                                          ∑ j ≤n
                                                                              e ji
                                                                            j=1
                  shows the average of keywords w i in corpus D; thus,∀ w i ∈ ∆ d ,a i =  .
                                                                             n
                  Todeterminethenumberofrequiredciphertextsforeachkeyword(l i)wedefineanewmeasureasthreshold
                  τ which is the floor of the minimum element in the average vector A. Finally, we determine the number
                  of required-ciphertext vector φ by calculating the ceiling of the maximum frequency for each keyword in
                  DTM-matrix γ divided by the threshold τ:

                                                               ⌈    1≤i≤n  ⌉
                                                                max ei
                                                   ∀ w i ∈ D, k i =
                                                                    τ

                  where max ei is the maximum value in the w i column.
   86   87   88   89   90   91   92   93   94   95   96