Page 26 - Read Online
P. 26

Page 8 of 16                  Zander et al. Complex Eng Syst 2023;3:9  I http://dx.doi.org/10.20517/ces.2023.11
























               Figure 3. Comparisons of average game score and number of victories in challenging scenarios for the manually tuned model (base), the
               product of reinforcement learning (RL), and randomly initialized models (random).


















                                   Figure 4. Rolling averages of several performance metrics throughout training.


               between inputs and outputs are encoded in natural language. Similar systems were employed to determine the
               relative value of asteroids as targets and dictating turning and shooting behaviors.

               As previously discussed, the explainability of the FIS in applications such as control systems is already well
               established. Ratherthanfurtherclarifyingthis, thedescribed algorithm andAsteroidSmasherexampleinstead
               servetodelineateandtestanalternativetomorecommonlyemployedmethods fortheoptimizationofexisting
               FIS-based architectures. Successful application here offers experimental evidence to join the relatively scarce
               literature on developing fuzzy systems with RL as opposed to historically prevalent approaches such as genetic
               algorithms.

               3.3. Case study: reinforcement learning ANFIS for classic control environments
               3.3.1. Introduction to ANFIS
               Theintersectionoffuzzy logic andRLstandstooffermorethananalternativemethod ofpost-hoc optimization
               for the former: notoriously opaque NN architectures also stand to benefit from integration with explainable
               fuzzy systems. In other words, neuro-fuzzy systems offer a means for taking advantage of the strengths of arti-
               ficial NNs while peering a little further into the black-box models that find widespread use in RL applications
               and elsewhere.


               To bolster experimental results in this domain, we tested an ANFIS [30]  (Figure 6) on OpenAI’s classical control
               environment CartPole [51] . An example of the environment is shown in Figure 7. The ANFIS extends the TSK
               system by allowing the parameters of the fuzzy rules to be learned via gradient descent optimization rather
   21   22   23   24   25   26   27   28   29   30   31