Page 90 - Read Online
P. 90
Salmani et al. J Surveill Secur Saf 2020;1:79–101 I http://dx.doi.org/10.20517/jsss.2020.16 Page 83
outputs R(q) a set of document identifiers whose corresponding documents are the most relevant files to
the query q.
6. Decryption (C i ,K): The decryption algorithm takes in an encrypted data file C i ∈ C and a secret key K as
input, and outputs D i.
be a set of queried keywords of query q i. A
Definition 2. (History). Let D be a document corpus. Let ∆ q i
t ) over t queries.
history over D is a tuple H = (D,∆ q 1 ,. . .,∆ q t
The history is information that we are trying to hide from an adversary (cloud).
t ˆ ˆ
Definition 3. (Search Pattern). The search pattern over a history H is a tuple, Ψ = (∆ q 1 ,. . .,∆ q t ), over t
ˆ ,1 ≤ i ≤ t is a set of encrypted keywords in the i-th query.
queries where ∆ q i
t
Definition 4. (Access Pattern). The access pattern over a history H is a set, Ω = (R(q 1 ),. . .,R(q t )) over t
queries.
2.3 Threat model
Weconsiderthecloudserveran“honest-but-curious”entityinourmodel [4,9–11,14] . Thismeansthecloudserver
complies with the designated protocol (“honest”), but it is eager to collect more information by analyzing the
encrypted data, message flows, and the index (“curious”). In our scheme, we assume that the cloud server
knows the employed encryption and decryption methods, in addition to the encrypted documents C and
index SI. However, it does not know the key K. We are willing to leak document identifiers id(D i ),1 ≤ i ≤ n,
encrypted queries and the access pattern defined in Definition 4. We can assume that the document sizes will
[4]
also be leaked, but it can be trivially preserved by a “padding” method . Thus, we classify the attack model
to Known Ciphertext Attack in which the adversary only observes the ciphertext, i.e., encrypted documents
C, encrypted index SI, and queries.
3 PRIVACY REQUIREMENTS
To address security concerns and preventing a “honest-but-curious” server (see Section 2.3) from collecting
users’ personal information, the data owner applies a symmetric key cryptography before outsourcing data
to the cloud. Although cryptography impedes the cloud prying into the data owner’s private data, it cannot
address all privacy concerns. Ideally, a cloud should learn nothing but the (encrypted) search results; and
it jeopardizes data privacy and even security if the cloud deduces any information from the index, accessed
files, queried keywords, etc. For example, by analyzing this information, the cloud server may infer the major
subject of a document, or even the file’s content [17] . Therefore, methods must be designed to prevent the cloud
from performing these kind of association attacks. Data privacy and index privacy are default requirements in
the literature, and in the following, we enumerate more challenging and more complex privacy requirements.
1. Search Pattern Privacy: Uncovering the relation between two or more search requests can lead to more in-
formation leakage and data/user privacy violation. Also, the resultant documents, which are ranked based
on the query q, provide a good opportunity for the cloud to identify the keywords and their correspond-
ing outsourced documents. Disguising the search pattern from unauthorized parties is among the most
complex challenges in this field, so related literature [10,11,18] has not yet addressed this completely issue.
2. Co-occurrence Keyword Privacy: Keywords with the same distribution pattern expose more privacy vio-
lation risks. In the other words, the privacy of the keywords that co-occur often are tied to each other, and
compromising the privacy of one term can lead to privacy violation of the co-occurring term. As a result,
the privacy level of co-occurring terms is lower than the regular terms in the same condition. Therefore,
we should protect and hide this term dependency to protect the co-occurring terms or at least put them at
the same level of privacy protection with singularly occurring terms.