Page 136 - Read Online

P. 136

Boin et al. Intell Robot 2022;2(2):14567 I http://dx.doi.org/10.20517/ir.2022.11 Page 161

Figure 9. Results for a specific 60s test episode using the 2 vehicle 1 platoon environment trained using Intra-FRLWA.

Table 6. Performance after training across 4 random seeds for both Inter and Intra FRL. Each simulation result contains 600 time steps.
Training Method Seed 1 Seed 2 Seed 3 Seed 4 Average system reward Standard deviation
Inter-FRLGA -2.79 -2.81 -3.05 -2.76 -2.85 0.11
Inter-FRLWA -2.64 -2.88 -2.92 -2.93 -2.84 0.12
Intra-FRLGA -2.85 -8.05 -4.23 -2.99 -4.53 2.10
Intra-FRLWA -2.56 -2.60 -2.68 -2.75 -2.65 0.07

stochastic random input generated by the platoon leader. As vehicle 1 is training, vehicle 2 trains based off
the policy of vehicle 1. As previously stated, Inter-FRL shares parameters amongst vehicles in the same index
across platoons, where-as Intra-FRL provides the advantage of sharing parameters from preceding vehicles to
following vehicles. Our implementation of Intra-FRL includes a directional parameter averaging. For exam-

131 132 133 134 135 136 137 138 139 140 141