Page 91 - Read Online

P. 91

Page 84 Salmani et al. J Surveill Secur Saf 2020;1:79–101 I http://dx.doi.org/10.20517/jsss.2020.16

4 LRSE SCHEME
Tomeettheserequirements, weproposetheLRSEscheme. LRSEisaprivacypreservingmulti-keywordranked
search mechanism over encrypted files. We adopt the secure inner product proposed by Wong et al. [14] . The
index SI and the query q are both protected using this encryption strategy.

The key idea in our approach is to generate a variety of ciphertexts for high frequency keywords from the
corpus D. Hence, multiple encrypted versions of a keyword will be used to search for the same keyword. This
directly affects the search pattern ψ that the adversary collects to discover the keyword plaintext. We propose
two schemes: Scheme I introduces a solution for searching with sequential scan using our novel chaining
notation; and Scheme II, expresses how the chaining notion can be employed to handle controlled searching
with an outsourced encrypted index.

4.1 Scheme I - sequential scan
Recall that in Definition 1 a searchable encryption scheme contains a suite of 6 algorithms. Here, we explain
how each algorithm works in Scheme I. In the setup phase, beside calling the KeyGen to generate the secret
key K, we generate a document-term matrix and a required ciphertext vector. Scheme I’s details are presented
here:

• Setup. Let γ be a n × d document-term matrix(DTM) which an element e ij represents the frequency of
keyword w j in document D i. Let φ = (l 1 ,. . .,l d ) be required-ciphertext vector which l i ,1 ≤ i ≤ d shows

the number of required ciphertext for keyword w i. We first extracts the corpus keyword dictionary ∆ d
from the entire corpus D and then generates the document-term matrix γ. Afterwards, we generate the
the required-ciphertext vector φ (see below for more details).Then we call KeyGen algorithm to generate
the secret key K that will be used to encrypt the documents.

• RequiredCiphertext. The probability of querying high frequency keywords is high, so these keywords are
more prone to privacy leakages. In a strong attack model [12] where the cloud server is equipped with more
knowledge such as the term frequency statistics, the attacker(cloud) can extract invaluable information
form the encrypted files [16] . Moreover, by issuing each query to the cloud, the data user is revealing more
private information such as her interests and hobbies. Thus, there are two main challenges: 1) hiding the
keywordfrequenciesintheoutsourcedencrypteddocumentsand2)obscuringthefrequencyofthequeried
keywords which may lead to exposing the underlying keywords [12] .

In an ideal world the keyword frequencies are equal, and the data user uniformly queries all of the available
keywords. Thus, the cloud observation from the encrypted files and the queries does not leak any informa-
tion. Although this is impossible in the real world, our goal is to break down the keyword frequencies to
get close to the ideal scenario. For this reason we calculate the average of each column (average of each key-
word in the document collection D). Note that because we are employing uniform distribution the average
and the median are effectively the same. Let A = (a 1 ,. . .,a d ) be the average vector in which a i ,1 ≤ i ≤ d
∑ j ≤n
e ji
j=1
shows the average of keywords w i in corpus D; thus,∀ w i ∈ ∆ d ,a i = .
n
Todeterminethenumberofrequiredciphertextsforeachkeyword(l i)wedefineanewmeasureasthreshold
τ which is the floor of the minimum element in the average vector A. Finally, we determine the number
of required-ciphertext vector φ by calculating the ceiling of the maximum frequency for each keyword in
DTM-matrix γ divided by the threshold τ:

⌈ 1≤i≤n ⌉
max ei
∀ w i ∈ D, k i =
τ

where max ei is the maximum value in the w i column.

86 87 88 89 90 91 92 93 94 95 96