Page 23 - Read Online

P. 23

Zander et al. Complex Eng Syst 2023;3:9 I http://dx.doi.org/10.20517/ces.2023.11 Page 5 of 16

3. REINFORCEMENT LEARNING WITH FUZZY SYSTEMS
RLinthecontextoffuzzylogicwasintroducedin [24] andbasedontheMamdaniframework. Itwassuccessfully
used in the CartPole problem. RL TSK fuzzy systems have been developed for various applications [28,42] .

3.1. Reinforcement learning TSK fuzzy system
Aslearnedbehaviorsincomplexsystemsareoftentheresultofintricateoptimizationprocesses, explainableRL
constitutes a challenging problem. However, the interpretability of fuzzy systems indicates the potential value
of a method of TSK fuzzy system optimization inspired by traditional -learning. In other words, a system
in which one approximates the function with a TSK fuzzy system [28] stands to offer some unique benefits.
This approach was successfully employed in applications such as a simulation in a simple discrete grid-world
environment or in continuous environments such as CartPole or Lunar Lander [28] .

The idea in the setting of a TSK-based -learning is to set the function to be approximated by a TSK fuzzy
system, i.e.,
( , ) = ( , ).
TD -learning equation is given by

′ ′
( , ) = ( , ) + ( ( ) + ( , ) − ( , ))
When updating the parameters of a system in RL we can use the general equation

′ ′
= + ( ( ) + ( , ) − ( , ))

with , = 1, ..., being parameters of the system. As a result, we can update the parameters of the fuzzy
system using this approach in the context of a TSK fuzzy system as

( , )
′ ′
= + ( ( ) + ( , ) − ( , )) .

Here, are the parameters of the fuzzy system. Just as in the case of RL, one may wish to mitigate instability
via techniques such as experience replay and the usage of an actor-critic environment.

To give a theoretical foundation to our proposed algorithm, we include here an initial discussion on conver-
gence. First, weobservethattheQ-learningalgorithmisknowntobeconvergent [43,44] understandardassump-
tions on the given Markov Decision Process (MDP). Additionally, NNs with various membership functions
are known to be universal approximators [45,46] . Combining the above results and the conclusions in [47,48] , we
can approximate the Q function by the output of the NN, which leads to the convergence of Deep Q-learning
models. It is known that TSK fuzzy systems are universal approximators [49,50] , i.e., they have similar approx-
imation properties to those of NNs. Together, the ideas above allow the conclusion that replacing the NN
in a DQN architecture with a TSK fuzzy system will retain the same properties as DQNs. In summary, the
TSK fuzzy system is an approximator of the Q function. Therefore, the Q-learning algorithm with TSK fuzzy
systems is convergent. The above theoretical motivation warrants a deeper investigation of the approximation
properties of RL TSK fuzzy systems as a topic for future research.

3.2. Case study: Reinforcement learning TSK system for Asteroid Smasher
3.2.1. Problem description
To test the algorithm, we created optimized TSK fuzzy systems through RL to play a variant of the game
AsteroidscalledAsteroidSmasher(seeFigure1). DevelopedbytheUniversityofCincinnatifortheExplainable
Fuzzy AI Challenge (XFC 2022), the game environment incorporates additional complexities to increase both
the difficulty and value of explainable automation. These include the addition of multiple agents, screen-wrap
for ships and hazards, and unique scenarios to test edge cases.

18 19 20 21 22 23 24 25 26 27 28