Page 131 - Read Online
P. 131

Page 156                         Boin et al. Intell Robot 2022;2(2):145­67  I http://dx.doi.org/10.20517/ir.2022.11

                                           Table 1. Parameters of the AV platoon environment
                                            Parameter                   Value
                                            Time step    interval       0.1 s
                                            Number of time steps per training episode  600
                                                                        1 s
                                            Time gap ℎ   
                                            Driveline dynamics coefficient     0.1 s
                                            Maximum absolute control input            2.5   /   2
                                            Reward coefficient          0.4
                                            Reward coefficient          0.2
                                            Reward coefficient          0.2
                                            Reward coefficient          0.2


























                     (a) The actor network for      .             (b) The critic network for      .

                                               Figure 5. Actor and critic networks for      .


                                           Table 2. Hyperparameters for the DDPG Algorithm
                                   Hyperparameter  Value
                                   Actor learning rate  5e-05
                                   Critic learning rate  0.0005
                                   Batch size    64
                                   Noise         Ornstein-Uhlenbeck Process with    = 0.15,    = 0.02
                                                                             −3
                                   Weights and Biases  random uniform distribution [−3 × 10 , 3 × 10 ] (final layer),
                                                                       −3
                                                 [           ]
                                   Initialization  − √  1  , − √  1  (other layers)
                                                                              
                                             Table 3. FRL Specific Initial Hyperparameters
                                        FRL type  Aggregation method  Hyperparmeter  Value
                                        Inter-FRL  Gradients    FRL update delay  0.1
                                        Inter-FRL  Gradients    FRL cutoff ratio  0.8
                                        Inter-FRL  Weights      FRL update delay  30
                                        Inter-FRL  Weights      FRL cutoff ratio  1.0
                                        Intra-FRL  Gradients    FRL update delay  0.4
                                        Intra-FRL  Gradients    FRL cutoff ratio  0.5
                                        Intra-FRL  Weights      FRL update delay  0.1
                                        Intra-FRL  Weights      FRL cutoff ratio  1.0


               performs simulations for a single 60 second episode using the trained models, calculating the cumulative re-
               ward of the model(s) in the experiment. The entire project is designed and implemented using Python3, and
   126   127   128   129   130   131   132   133   134   135   136