Page 131 - Read Online

P. 131

Page 156 Boin et al. Intell Robot 2022;2(2):14567 I http://dx.doi.org/10.20517/ir.2022.11

Table 1. Parameters of the AV platoon environment
Parameter Value
Time step interval 0.1 s
Number of time steps per training episode 600
1 s
Time gap ℎ
Driveline dynamics coefficient 0.1 s
Maximum absolute control input 2.5 / 2
Reward coefficient 0.4
Reward coefficient 0.2
Reward coefficient 0.2
Reward coefficient 0.2

(a) The actor network for . (b) The critic network for .

Figure 5. Actor and critic networks for .

Table 2. Hyperparameters for the DDPG Algorithm
Hyperparameter Value
Actor learning rate 5e-05
Critic learning rate 0.0005
Batch size 64
Noise Ornstein-Uhlenbeck Process with = 0.15, = 0.02
−3
Weights and Biases random uniform distribution [−3 × 10 , 3 × 10 ] (final layer),
−3
[ ]
Initialization − √ 1 , − √ 1 (other layers)

Table 3. FRL Specific Initial Hyperparameters
FRL type Aggregation method Hyperparmeter Value
Inter-FRL Gradients FRL update delay 0.1
Inter-FRL Gradients FRL cutoff ratio 0.8
Inter-FRL Weights FRL update delay 30
Inter-FRL Weights FRL cutoff ratio 1.0
Intra-FRL Gradients FRL update delay 0.4
Intra-FRL Gradients FRL cutoff ratio 0.5
Intra-FRL Weights FRL update delay 0.1
Intra-FRL Weights FRL cutoff ratio 1.0

performs simulations for a single 60 second episode using the trained models, calculating the cumulative re-
ward of the model(s) in the experiment. The entire project is designed and implemented using Python3, and

126 127 128 129 130 131 132 133 134 135 136