Page 93 - Read Online

P. 93

Page 86 Salmani et al. J Surveill Secur Saf 2020;1:79–101 I http://dx.doi.org/10.20517/jsss.2020.16

Algorithm 1 Encryption
1: procedure Encryption( D,K,∆ d , φ)
ˆ
2: ∆ d = BuildChain(K,∆ d , φ)
0
3: for all D i in D do
4: while !eo f do
5: w = readNextWord(D i)
6: if isKeyword(w) then
ˆ
7: ˆ w = select randomly an encrypted ciphertext for w from ∆ d 0
8: else
9: ˆ w = encrypt w with secret key K
10: end if
11: C i + = ˆw
12: end while
13: Add(C,C i )
14: end for
15: return C
16: end procedure

Since the cloud sees the encrypted document collection C, it generates the DTM matrix (γ ) based on the
0
encrypted keywords in C. Thus, the number of columns in γ is d rather than d (d ≤ d ) and the high
0
0
0
frequency keywords are eliminated in the whole matrix.
• GenerateQuery. In the initialization phase, the data owner and the data user exchange φ vector and the
secret key K that enable the data user to make an encrypted query q. In addition, because we have multiple
encryptedversions(ciphertexts)ofeachkeyword, thedatausercanuseaportionoftheavailableciphertexts
for each keyword, but the data user must use the same portion for all of the keywords in the same query to
not effect the results. For example, the data user may decide to use sixty percent of available ciphertext for
each keyword, but he cannot employ sixty percent for the first keyword and forty percent for the second
one, because it makes the results imprecise. Finally, encrypted query q is sent to the cloud. The data user
may send an optional parameter k to the cloud to retrieve only the top-k resultant documents.
This is one of the characteristics that distinguishes our approach from other schemes. With each query
the data user is able to randomly choose some of the ciphertext for each keyword which delivers more
uncertaintyandconsequentlymoreentropy. Thus, evenifconsecutivequeriessharesomeoftheirkeywords,
the cloud is not able to find a pattern between the queries due to using different versions of ciphertext in
eachquery. Moreover, co-occurringtermsappearwithdifferentciphertextintheencryptedfiles, so, finding
the co-occurring terms becomes significantly more difficult for the cloud.
The details of Query is shown in Algorithm 2. The data user declares the “portion” manually or it can
be determined randomly by the algorithm (like we did in the Algorithm 2). This feature determines the
percentage of each keyword ciphertext that will be employed in the query encryption process. For example,
if w i possesses five different ciphertext and the portion is set to sixty percent, the algorithm employs three
versions of the ciphertexts randomly for the current query. Moreover, the data user is able to generate the
ciphertexts as all encrypted versions are chained together. Note that the plain query can be indicated by
today’s web search engine such as Bing® and Google®, in which the data users tend to provide a sentence in
natural languages or a set of keywords to express their intentions. In this case, we first extract the keywords
∆ q from the plain query.

• Search. Before explaining the LRSE search algorithm we define the document and query vector and the

88 89 90 91 92 93 94 95 96 97 98