Page 29 - Read Online
P. 29

Page 10 of 22                  Ernest et al. Complex Eng Syst 2023;3:4  I http://dx.doi.org/10.20517/ces.2022.54


               2.2.5. Specification identification and development
               One of the most difficult, and important, aspects of formal verification is specification identification and de-
               velopment. Most often, these specifications come directly from system requirements. However, they can also
               come from other sources such as certification criteria and standards, external regulations such as Rules of En-
               gagement (RoE), and simulation and testing results. In this work, the specifications come from expert knowl-
               edge on desired system behavior with respect to performanceand safety of actions taken bythe AI. Specifically,
               that the AI will not take certain actions if unacceptable damage or losses will occur to friendly forces.


               Different types of specifications are created based on the FISs that are affected, the needed expressiveness of
               formal languages in which they will be translated, and the tools available for verifying the systems. For this
               study, four safety specifications were developed. Safety, in this context, refers to behavior such thatnothing bad
               ever happens, where ”bad” is defined as an undesirable set of system states. Some would help guarantee not
               only safe, but also ideal behavior from a performance perspective. However, other safety specifications solely
               focus on desired safety qualities at the expense of raw fitness performance.


               These specifications, in natural language, are as follows:
                • MedivacHealingSpec: Ifamarine’shealthisveryhighandthemarineisnotunderattack,itscorresponding
                  healing bid must be very low;
                • Marine FiringSpec: If a separate friendly marine has already bid to attack a target whose health is very low,
                  than a marine’s fire bid on that target must be very low;
                • Tank Firing Spec: If a friendly unit would take splash damage from a tank shot and the lowest health
                  friendly in splash range is between very low and low health, only fire at the target if it would do very high
                  damage to the enemy;
                • Tank Conservative Firing Spec: Never cause splash damage to any friendly unit, regardless of their health.

               Through setting the values to the corresponding terms utilized within these specifications to that of our MFs,
               we canquite simplyapply these specifications to a trained Fuzzy Treethroughthe formal verification tools with
               the Psion fuzzy logic package. If a counterexample is found where the specification does not hold, typically
               the designer must either change the spec, or as we will limit this study to, modify the MFs and/or RBs of the
               associated FISs.



               3. RESULTS
               3.1. Reinforcement learning
               EVE was instantiated to have a relatively small population, 60 chromosomes, each defining a set of MFs and
               RBs for all seven FISs. The training process was run for 50 generations, each chromosome being evaluated
               over three scenarios per generation. Thus, this system was trained over 9,000 Starcraft 2 engagements, which
               was sufficient in this case to reach high performance.


               Presented in Figure 6 is a plot showing the best, worst, and mean chromosome fitness within the popula-
               tion of 60 at each generation. Additionally, the running best chromosome’s fitness for each generation is
               displayed. Generally with the fitness function utilized in this system, if the chromosome successfully com-
               pleted its mission, it would score at least quite close to a positive number. A score of 389.49 was found for the
               best chromosome thus far. The breakdown of this fitness value is shown in Table 1.


               EVE has a heavy focus on maintaining diversity, and therefore especially with a generally smaller population, it
               isnotsurprisingtoseethatduringeachgenerationtheworstchromosomewasnotabletosuccessfullycomplete
               this difficult mission. The distribution between best, worst, and mean is at least one sign of a healthily diverse
               population throughout the generations. This can help prevent stagnation and ensure continued improvements.
   24   25   26   27   28   29   30   31   32   33   34