Page 53 - Read Online
P. 53
Boin et al. Intell Robot 2022;2(2):14567 I http://dx.doi.org/10.20517/ir.2022.11 Page 151
0
0 1 −ℎ 0 0
0
0 0 −1 1 0
= 1 = 1 = . (7)
0 0 − 0 0
1 1
0 0 0 −
0
−1 −1
2.2. MDP model formulation
The AV platooning problem can be formulated as an MDP problem, where the optimization objective is to
minimize the previously defined , , and lastly jerk.
2.2.1. State space
The state space formula (6) can be discretized using the forward euler method giving the system equation
below
, +1 = , + , + −1, , (8)
where , = [ , , , , , , −1, ] is the observation state for the MDP problem that includes the position
error , , velocity error , , acceleration , , and the acceleration of the predecessor vehicle −1, at time
step . Moreover, , , and are given as
1 − ℎ 0
0
0
0 1 −
0
0
= 0 0 − + 1 0 = = . (9)
0
0 0 0 − + 1 −1
0
−1
2.2.2. Action space
Eachvehiclewithinasinglelaneplatoonfollowsthevehicleinfrontofit, andassuchtheonlyactionthevehicle
may take to maintain a desired headway is to accelerate, or decelerate. The action for the system is defined as
the control input , to the vehicle.
2.2.3. Reward function
The design of a reward in a DDPG system is critical to providing good performance within the system. In
the considered driving scenario, it is logical to minimize position error, velocity error, the amount of time
spent accelerating and the jerkiness of the driving motion. The proposed reward thus includes the normalized
position error, , , velocity error , , control input , and lastly the jerk. The vehicle reward , is given