Page 102 - Read Online
P. 102

Salmani et al. J Surveill Secur Saf 2020;1:79–101  I http://dx.doi.org/10.20517/jsss.2020.16  Page 95


                                                                                 Result Accuracy
                                Result Accuracy
                100%                                              100%
                                                                  95%
                 95%
                                                                  90%
                 90%
                                                                  85%
                 85%                                              80%
                Accuracy  80%                                    Accuracy  75%
                 75%
                                                                  70%
                 70%
                 65%                                              65%
                                                                  60%
                 60%
                 55%                                              55%
                 50%                                              50%
                   0  10%  20%  30%  40%  50%  60%  70%  80%  90%  100%  0  3  6  9  12  15  18  21  24  27  30
                                  Portion                                            Top_K
               Figure 4. Effect of portion on result accuracy over 5000  Figure 5. Result accuracy based on top_k over 5000 queries.
               queries.
               We assumed that the number of keywords in each query is between five and ten. In other words, each time our
               query simulator is generating a random number in this range (5-10) which indicates the number of keywords
               in the corresponding query.
               Our analysis includes result accuracy and privacy assessment. In general, the cloud server observes two groups
               of vectors: document vectors and query vectors. Our experimental evaluation demonstrates a higher privacy
                                                                                          [9]
               in both groups, and a higher level of result accuracy compared with Cao’s result precision .

               6.1  Result Accuracy
               Section 4.1 describes a certain number of available ciphertexts (portion) are selected in every query. Although,
               the same portion for each keyword is employed in the query, it may effect the accuracy of the results due to
               the reason we explained in Section 5.3. Thus, there is a small chance to lose some result accuracy when the
               number of available ciphertexts for a keyword is bigger than its frequency in a specific document (because the
               encrypted versions that are employed in the document may differ from the ones in the query). In other words,
               when the cloud server returns the top-k documents based on their similarity to the query some of the real
               top-k relevant documents may be excluded.


                                    [9]
               This issue occurs in Cao’s work when dummy keywords are inserted into each document vectors. conversely,
               to boost the privacy level, LRSE does not insert dummy keywords, instead we employ multiple ciphertexts to
               represents each keyword (based on their frequency) and for this reason (not adding noise to the document
               vectors) we expect to see higher level of result accuracy in LRSE. To evaluate the accuracy of the LRSE results
               we define the result accuracy R acc = k(K ∩ K)k/kKk where K and K are sets of expected result documents
                                                                         0
                                                 0
               and documents retrieved by cloud server using LRSE. Additionally, kAk determines the number of elements
               in set A. Figure 4 and Figure 5 demonstrate our results.
               Recall that the “portion” determines percentage of each keyword ciphertext that will be employed in query
               encryption process (see Section 4.1). Figure 4 shows the effect of portion on result accuracy over 5000 queries.
               As the diagram shows LRSE achieves more than 91% result accuracy even when only 10% of the available
               ciphertexts are employed. Note that in our calculation in Section 5.2 we assumed 40% of the ciphertexts are
               employed and setting the portion to 10% increases the number of possible permutations and it becomes harder
               for the cloud server to analyze the access pattern. Moreover, increasing the portion from 10% to 20% raises
               the result accuracy around 5% and it gets to 95% which seems to be a reasonable trade-off to gain more result
               accuracy.


               Figure5demonstratestheeffectoftop-k ontheresultaccuracy. AsthefigureshowsLRSEachievestomorethan
               98% result accuracy for top-3 documents. Top-10 or top-15 seems to be a reasonable top-k in our simulation
               since we have 203 books in our dataset. Even if we consider top-20 (which is 10% of our repository), we have
               more than 97% result accuracy. In comparison to MRSE (Cao’s work), LRSE achieves a higher precision in
   97   98   99   100   101   102   103   104   105   106   107