Page 23 - Read Online
P. 23

Zander et al. Complex Eng Syst 2023;3:9  I http://dx.doi.org/10.20517/ces.2023.11  Page 5 of 16



               3. REINFORCEMENT LEARNING WITH FUZZY SYSTEMS
               RLinthecontextoffuzzylogicwasintroducedin [24] andbasedontheMamdaniframework. Itwassuccessfully
               used in the CartPole problem. RL TSK fuzzy systems have been developed for various applications [28,42] .

               3.1. Reinforcement learning TSK fuzzy system
               Aslearnedbehaviorsincomplexsystemsareoftentheresultofintricateoptimizationprocesses, explainableRL
               constitutes a challenging problem. However, the interpretability of fuzzy systems indicates the potential value
               of a method of TSK fuzzy system optimization inspired by traditional   -learning. In other words, a system
               in which one approximates the    function with a TSK fuzzy system [28]  stands to offer some unique benefits.
               This approach was successfully employed in applications such as a simulation in a simple discrete grid-world
               environment or in continuous environments such as CartPole or Lunar Lander [28] .


               The idea in the setting of a TSK-based   -learning is to set the    function to be approximated by a TSK fuzzy
               system, i.e.,
                                                      (  ,   ) =       (  ,   ).
               TD   -learning equation is given by

                                                             ′       ′
                                         (  ,   ) =   (  ,   ) +   (  (   ) +     (   ,   ) −   (  ,   ))
               When updating the parameters of a system in RL we can use the general equation

                                                                                
                                                       ′       ′
                                                =       +   (  (   ) +     (   ,   ) −   (  ,   ))
                                                                                   
               with       ,    = 1, ...,    being parameters of the system. As a result, we can update the parameters of the fuzzy
               system using this approach in the context of a TSK fuzzy system as

                                                                                   (  ,   )
                                                 ′          ′
                                          =       +   (  (   ) +         (   ,   ) −       (  ,   ))  .
                                                                                     
               Here,       are the parameters of the fuzzy system. Just as in the case of RL, one may wish to mitigate instability
               via techniques such as experience replay and the usage of an actor-critic environment.


               To give a theoretical foundation to our proposed algorithm, we include here an initial discussion on conver-
               gence. First, weobservethattheQ-learningalgorithmisknowntobeconvergent [43,44]  understandardassump-
               tions on the given Markov Decision Process (MDP). Additionally, NNs with various membership functions
               are known to be universal approximators [45,46] . Combining the above results and the conclusions in [47,48] , we
               can approximate the Q function by the output of the NN, which leads to the convergence of Deep Q-learning
               models. It is known that TSK fuzzy systems are universal approximators [49,50] , i.e., they have similar approx-
               imation properties to those of NNs. Together, the ideas above allow the conclusion that replacing the NN
               in a DQN architecture with a TSK fuzzy system will retain the same properties as DQNs. In summary, the
               TSK fuzzy system is an approximator of the Q function. Therefore, the Q-learning algorithm with TSK fuzzy
               systems is convergent. The above theoretical motivation warrants a deeper investigation of the approximation
               properties of RL TSK fuzzy systems as a topic for future research.

               3.2. Case study: Reinforcement learning TSK system for Asteroid Smasher
               3.2.1. Problem description
               To test the algorithm, we created optimized TSK fuzzy systems through RL to play a variant of the game
               AsteroidscalledAsteroidSmasher(seeFigure1). DevelopedbytheUniversityofCincinnatifortheExplainable
               Fuzzy AI Challenge (XFC 2022), the game environment incorporates additional complexities to increase both
               the difficulty and value of explainable automation. These include the addition of multiple agents, screen-wrap
               for ships and hazards, and unique scenarios to test edge cases.
   18   19   20   21   22   23   24   25   26   27   28