Page 104 - Read Online

P. 104

Salmani et al. J Surveill Secur Saf 2020;1:79–101 I http://dx.doi.org/10.20517/jsss.2020.16 Page 97

Standard Deviation Deduction
Entropy Improvement 100%
80%
90%
70%
80%
60% 70%
Improvement 50% Deduction 60%
50%
40%
40%
30%
20% 30%
20%
10% 10%
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Documents Documents
Figure 6. Entropy improvement of the first 20 documents. Figure 7. Standard deviation deduction in the first 20 docu-
ments.
identifying the keywords becomes.

Figure 7 demonstrates the standard deviation reduction of first 20 documents which are calculated by compar-
ing the standard deviation of the original document vector and the corresponding vector in LRSE. The results
show at least a 80%, σ reduction in each document. This means the frequency of the keywords are at least
80% closer to each other which preserve the data privacy against privacy attacks such as frequency statistical
analysis mentioned above.

6.3 Query Vectors
Although documents’ vectors are constant and barely change, the query vectors are prone to change as the user
intentions and demands change. In other words, the number of queries increases over time, more information
such as access pattern will be revealed to the cloud. For this reason, we dedicate the third part of our analysis
to query vectors. The result analyses shows that LRSE protects the access pattern and privacy of the queries
even when the number of queries grows.

6.3.1 Euclidean Distance from Ideal Vector
To preserve the access pattern, the ideal is the cloud server sees all of the queried keywords with the same
frequency. In other words, after receiving m search requests the normalized vector of queries on n keywords
is: ( , , ,...), which means that to predict the next queried keywords or discovering the underlying plain
1 1 1
n n n
keywords, the cloud server has no more chance than flipping a coin, which is the best case scenario.
WeapplyEuclideandistancemeasuretodeterminethedistancebetweentheidealvectorandtheoriginal/LRSE
query vector. The less the Euclidean distance is, the closer we are to the ideal vector, and the more private is
the data. To calculate the query vector, we processed the frequency of each queried keyword after every 3000
queries (for both original and LRSE queries). We then calculate its Euclidean distance from the ideal vector.

Figure 8 demonstrates the Euclidean distance improvement. The results show that the query frequency vector
is at least 67% closer to the ideal vector after 30000 search requests submitted. Note that this is the minimum
improvement due to using uniform distribution. We randomly select some keywords to create the queries.
However in the real world users keep asking for documents in their field of expertise or their interests which
makes the original frequency vector farther away from the ideal vector.

6.3.2 Standard Deviation of Query Vectors
In Section 6.2.2 we explained the importance of having low standard deviation(σ) and analyzed the σ of LRSE
document vectors. In this section we study the σ reduction of the query vectors. Unlike the documents, the
number of the queries and consequently query vectors increases during the time and for this reason we show
the σ reduction over time. We employed the same methodology in Section 6.2.2 to evaluate the standard
deviation reduction.

99 100 101 102 103 104 105 106 107 108 109