Page 29 - Read Online
P. 29

Zander et al. Complex Eng Syst 2023;3:9  I http://dx.doi.org/10.20517/ces.2023.11  Page 11 of 16




































                                          Figure 9. Comparisons between ANFIS DQN and DQN.

               3.3.3. ANFIS DQN vs DQN
               We compare the ANFIS DQN architecture proposed above with a standard DQN architecture. Both models
               are equipped with a Double DQN structure with soft updates and have a similar number of trainable param-
               eters; 17,410 for the DQN and 17,407 for the ANFIS DQN. Both models are also equipped with the same
               optimization method, learning rate, etc., and trained for 100,000 iterations. The appendix contains parameters
               and hyperparameters.

               3.3.4. Experimental results
               For testing the results of the two models, we look at the mean reward of 10 testing environments every 10,000
               iterations. The goal is to have the agent hit a mean reward of 500 for all 10 test environments.

               From Figure 9, we can see that both models can learn the environment. In some cases, we can see that the
               ANFIS DQN model learns the environment quicker, such as in seed 9 and seed 42. While in other cases, it
               seems to perform about as well as the DQN, such as in seeds 109 and 131. The most interesting case is in
               seed 42, where the ANFIS DQN and DQN solve the environment in the same number of steps, but the ANFIS
               DQN learns it with more stability while not falling off after learning and continues to have a solution after
               60,000 iterations. This trend also appears in seed 9, where after 70,000 iterations, the ANFIS DQN has a stable
               solution. From these tests, we can see that the ANFIS DQN is able to match, if not slightly outperform, the
               DQN agent.


               3.3.5. Explainability
               As suggested, a primary advantage of this approach over non-fuzzy DQN concerns the capacity for explain-
               ability; one may visualize the fuzzy rule components before and after training to help in understanding the
               changes made during optimization and the resulting behavior.

               To illustrate this, an example concerning the approximation of a mathematical function is shown in Figure 10.
               In this case, Gaussian functions corresponding to fuzzy rules expand and contract following gradient descent
   24   25   26   27   28   29   30   31   32   33   34