Page 136 - Read Online
P. 136

Boin et al. Intell Robot 2022;2(2):145­67  I http://dx.doi.org/10.20517/ir.2022.11  Page 161






























































                      Figure 9. Results for a specific 60s test episode using the 2 vehicle 1 platoon environment trained using Intra-FRLWA.


               Table 6. Performance after training across 4 random seeds for both Inter and Intra FRL. Each simulation result contains 600 time steps.
                          Training Method  Seed 1  Seed 2  Seed 3  Seed 4  Average system reward  Standard deviation
                          Inter-FRLGA   -2.79  -2.81  -3.05  -2.76           -2.85          0.11
                          Inter-FRLWA   -2.64  -2.88  -2.92  -2.93           -2.84          0.12
                          Intra-FRLGA   -2.85  -8.05  -4.23  -2.99           -4.53          2.10
                          Intra-FRLWA   -2.56  -2.60  -2.68  -2.75           -2.65          0.07


               stochastic random input generated by the platoon leader. As vehicle 1 is training, vehicle 2 trains based off
               the policy of vehicle 1. As previously stated, Inter-FRL shares parameters amongst vehicles in the same index
               across platoons, where-as Intra-FRL provides the advantage of sharing parameters from preceding vehicles to
               following vehicles. Our implementation of Intra-FRL includes a directional parameter averaging. For exam-
   131   132   133   134   135   136   137   138   139   140   141