Page 28 - Read Online
P. 28

Ernest et al. Complex Eng Syst 2023;3:4  I http://dx.doi.org/10.20517/ces.2022.54  Page 9 of 22


               For example, an input combination of 0.04 normalized target health, and 0.01 assigned attackers with its re-
               sultant output can be examined, and an explanation structure can be generated from membership function
               and rule labels: ”Bid output is Very High because target health is Very Low and assigned attackers is None”.
               For more extensive fuzzy tree structures, this explanation can be repeated across subsequent cascaded FISs
               allowing for the creation of a linguistic explanation of the entire decision process. This form of explainability
               and transparency will be heavily utilized during the formal verification process as manual corrections to the
               post-training model will be performed if any specifications are found to not be adhered to via formal methods.
               This requires direct changing of the code of the model at all levels, not just the input or output layers. Holistic
               understanding of any modifications made throughout this process are critical for any potential deployment of
               the system post-modification.


               2.2.4. Reinforcement learning
               The standard RL process for a GFT is to first create a portfolio of training scenarios that each individual in
               the GA population is evaluated over. This model was created through utilization of an open source Python
               package for interfacing with SC2 such that constructive runs through these scenarios is possible [24] . Within
               this study a single mission portfolio is utilized to highlight the formal verification processes, but for most
               applications a portfolio containing multiple holistic scenarios as well as specific training sub-problems would
               be included [11,14]


               The manner in which the performance will be evaluated must also be defined through the requisite Fitness
               Function for the GA. The fitness function utilized within this study is found in Equation 1.







                              = (                         ∗ 25.0) +                                        −                                      − (                  /100.0)
                                                                                                        (1)


               The magnitude range of this fitness function is not critical within EVE, but rather the ability for the evolution-
               ary process to differentiate relative fitnesses between potential solutions, or chromosomes, in a manner that
               thoroughly rewards good behavior and punishes bad. With this example, the terms specifically are a flat 50
               point reward for every marine alive at the end of the scenario. This is then added to the summation of the
               total friendly health remaining, including that of the siege tank, which has a notably higher health pool than
               marines. This is subtracted from the hostile force health pool remaining. Finally, there is a slight penalty for
               the number of timesteps it takes to complete the scenario, as if all other parameters have reached optimality,
               ideally the solution executes quickly in case additional threats would be inbound to the force. This function
               is able to be iterated over in future work, but serves as a good basis for the GA to evolve the population of
               chromosomes.


               There is an additional complexity with this particular problem due to the nature of Starcraft 2 and this training
               setup; non-deterministic fitness evaluations. As the fitness value given to any chromosome within the popu-
               lation will drastically affect both its probability for breeding as well as the relative worth of potential future
               chromosomes, the ability for a good chromosome to be ”unlucky” and a bad one to be ”lucky” during their
               respective evaluations can be damaging to the effectiveness of an evolutionary system. There are mitigating
               methods, such as evaluating the scenarios, or portions of them, multiple times. Within this study, each chro-
               mosomewillbeevaluatedatotalofthreetimes, withtheworstfitnessofthosethreebeingtheactualfitnessthat
               is assigned, to easily mitigate the worst case risk at the expense of computational efficiency of each generation’s
               evaluation.
   23   24   25   26   27   28   29   30   31   32   33