Page 127 - Read Online

P. 127

Page 152 Boin et al. Intell Robot 2022;2(2):14567 I http://dx.doi.org/10.20517/ir.2022.11

Figure 2. High level flow diagram of the DDPG model for a general vehicle in a platoon.

below, where , and are system hyperparameters.
( ¤ )
| , | | , | | , | | , |
, = − + + + (10)
max( , ) max( , ) max( , ) 2 max( , )

2.3. FRL DDPG algorithm
In this section, the design for implementing the FRL DDPG algorithm on the AV platooning problem is pre-
sented.

2.3.1. DDPG model description
The DDPG algorithm is composed of an actor, and a critic, . The actor produces actions ∈ U given
some observation ∈ X and the critic makes judgements on those actions while training using the Bellman
equation [12,24] . The actor is updated by the policy gradient [24] . The critic network uses its weights to ap-

[24]
proximate the optimal action-value function ( , | ) . The actor network uses weights to represent

the agents’ current policy ( | ) for the action-value function [24] . The actor ( ) : X −→ U maps the obser-
vation to the action. Experience replay is used to mitigate the issue of training samples not being independent
and identically distributed due to their generation from sequential explorations [24] . Two additional models,
the target actor and critic are used in DDPG to stabilize the training of the actor and critic networks by
0
0
updating parameters slowly based on the target update coefficient . A sufficient value of is chosen such that
stable training of and is observed. Figure 2 provides a high level simplified overview of how the DDPG
algorithm interacts with a single vehicle in a platoon.

2.3.2. Inter and intra FRL
Modifications to the base DDPG algorithm are needed in order to implement Inter-FRL and Intra-FRL. In
order to implement FedAvg the following modifications are required:
1. An FRL server: responsible for averaging the system parameters for use in a global update
2. Model weight aggregation: storing of each model’s weights for use in aggregation
3. Model gradient aggregation: storing of each model’s gradients for use in aggregation

122 123 124 125 126 127 128 129 130 131 132