Page 88 - Read Online

P. 88

Salmani et al. J Surveill Secur Saf 2020;1:79–101 I http://dx.doi.org/10.20517/jsss.2020.16 Page 81

key idea in our scheme is to exploit our chaining encryption notion to generate a variety of ciphertexts for
high frequency keywords which leads to more uncertainty and a uniform probability model for the keywords
distribution.

The contributions of this work are:

1. We explore the problem of leakless privacy preserving multi-keyword ranked search over encrypted cloud
data. We build on a private model to prevent (without compromising efficiency):

(a) Search pattern attack (tracking the keywords searched by two or more queries)

(b) Co-occurrence attack (determining keywords with similar frequency)

2. Our methodological contribution is a novel chaining encryption notation which prevent the aforemen-
tioned attacks.

3. We demonstrate using privacy and security analysis the correctness of our proposed method.

To achieve our goals (see Section 2.1) there are two approaches. One possibility is to create an index which
[4]
decreases the searches elapsed time. In the related literature two types of index are considered : (1) building
[15] ; (2) design an index which encompasses the entire corpus [10,16] . The alter-
an index for each document D i
native approach is to perform a sequential scan without an index. When the documents are large, an index will
likely be faster than sequential search, but on the flip side, storing and updating the index increases overhead
considerably. Either approach would be appropriate here depending on the corpus’s characteristics such as file
length and file modification frequency.

We first describe the sequential search scheme using our novel chaining notion (see Section 4.1). Next we
express our second scheme which benefits from an index for the whole corpus (see Section 4.2). Note that, in
thesecondschemeweexploitthechainingnotionideaingeneratingtheindexvectorseventhoughit(chaining
notion) is not employed to encrypt the documents.

1.1 System model
Our system model as illustrated in Figure 1, involves three different entities: data owner, data users, and cloud
server. The data owner has a collection of documents D (files) to be outsourced to the cloud server. Since files
may contain sensitive information and the cloud server is not fully trusted, data must be encrypted (C); and
any kind of information leakage that jeopardizes the data privacy is inadmissible. Moreover, for the sake of
effective data utilization and to ensure precise results, the cloud server must apply the search requests (queries)
on the encrypted data. Hence, before outsourcing the data onto the cloud, the data owner extracts a set of
keywords ∆ d to build an encrypted searchable index SI. We extract the keywords before encrypting the data,
so the keywords with a high frequency get encrypted into a number of ciphertexts. As a result, it becomes
significantly more difficult for the cloud to track specific keywords in documents as we will explain shortly.
Both the encrypted index and encrypted data are then transferred to the cloud server.

To search for files of interest, an authorized user first acquires a key K from the data owner through a search
[4]
control mechanism such as broadcast encryption . Upon receiving the encrypted search request q from a
data user, the cloud server applies the request on the corresponding index SI and returns the results R(q).
To increase result precision, the results are ranked based on their relevance to the request by the cloud server.
Furthermore, to reduce the communication cost, the data user may send an optional number k along with q,
so the cloud server only sends back the top-k documents that are most relevant to the search request [14] .

The rest of this paper is organized as follows: Section 2 presents our threat model, design goals, and the pre-

83 84 85 86 87 88 89 90 91 92 93