Page 102 - Read Online
P. 102
Salmani et al. J Surveill Secur Saf 2020;1:79–101 I http://dx.doi.org/10.20517/jsss.2020.16 Page 95
Result Accuracy
Result Accuracy
100% 100%
95%
95%
90%
90%
85%
85% 80%
Accuracy 80% Accuracy 75%
75%
70%
70%
65% 65%
60%
60%
55% 55%
50% 50%
0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 3 6 9 12 15 18 21 24 27 30
Portion Top_K
Figure 4. Effect of portion on result accuracy over 5000 Figure 5. Result accuracy based on top_k over 5000 queries.
queries.
We assumed that the number of keywords in each query is between five and ten. In other words, each time our
query simulator is generating a random number in this range (5-10) which indicates the number of keywords
in the corresponding query.
Our analysis includes result accuracy and privacy assessment. In general, the cloud server observes two groups
of vectors: document vectors and query vectors. Our experimental evaluation demonstrates a higher privacy
[9]
in both groups, and a higher level of result accuracy compared with Cao’s result precision .
6.1 Result Accuracy
Section 4.1 describes a certain number of available ciphertexts (portion) are selected in every query. Although,
the same portion for each keyword is employed in the query, it may effect the accuracy of the results due to
the reason we explained in Section 5.3. Thus, there is a small chance to lose some result accuracy when the
number of available ciphertexts for a keyword is bigger than its frequency in a specific document (because the
encrypted versions that are employed in the document may differ from the ones in the query). In other words,
when the cloud server returns the top-k documents based on their similarity to the query some of the real
top-k relevant documents may be excluded.
[9]
This issue occurs in Cao’s work when dummy keywords are inserted into each document vectors. conversely,
to boost the privacy level, LRSE does not insert dummy keywords, instead we employ multiple ciphertexts to
represents each keyword (based on their frequency) and for this reason (not adding noise to the document
vectors) we expect to see higher level of result accuracy in LRSE. To evaluate the accuracy of the LRSE results
we define the result accuracy R acc = k(K ∩ K)k/kKk where K and K are sets of expected result documents
0
0
and documents retrieved by cloud server using LRSE. Additionally, kAk determines the number of elements
in set A. Figure 4 and Figure 5 demonstrate our results.
Recall that the “portion” determines percentage of each keyword ciphertext that will be employed in query
encryption process (see Section 4.1). Figure 4 shows the effect of portion on result accuracy over 5000 queries.
As the diagram shows LRSE achieves more than 91% result accuracy even when only 10% of the available
ciphertexts are employed. Note that in our calculation in Section 5.2 we assumed 40% of the ciphertexts are
employed and setting the portion to 10% increases the number of possible permutations and it becomes harder
for the cloud server to analyze the access pattern. Moreover, increasing the portion from 10% to 20% raises
the result accuracy around 5% and it gets to 95% which seems to be a reasonable trade-off to gain more result
accuracy.
Figure5demonstratestheeffectoftop-k ontheresultaccuracy. AsthefigureshowsLRSEachievestomorethan
98% result accuracy for top-3 documents. Top-10 or top-15 seems to be a reasonable top-k in our simulation
since we have 203 books in our dataset. Even if we consider top-20 (which is 10% of our repository), we have
more than 97% result accuracy. In comparison to MRSE (Cao’s work), LRSE achieves a higher precision in