Page 126 - Read Online
P. 126

Boin et al. Intell Robot 2022;2(2):145­67  I http://dx.doi.org/10.20517/ir.2022.11  Page 151



                                                                   0
                                                                   
                                        0 1 −ℎ      0                            0  
                                                                               
                                                                   
                                                                 0               
                                        0 0  −1     1                         0 
                                                                   
                                         =     1                  = 1           =    .             (7)
                                       0 0 −        0                         0 
                                                                     
                                                      1                        1  
                                       0 0    0   −                            
                                                                   0
                                                         −1                        −1 
                                                                   
               2.2. MDP model formulation
               The AV platooning problem can be formulated as an MDP problem, where the optimization objective is to
               minimize the previously defined        ,        ,       and lastly jerk.
               2.2.1. State space

               The state space formula (6) can be discretized using the forward euler method giving the system equation
               below





                                                  ,  +1 =              ,   +              ,   +              −1,   ,  (8)


               where      ,   = [       ,   ,        ,   ,      ,   ,      −1,   ] is the observation state for the MDP problem that includes the position
               error        ,  , velocity error        ,  , acceleration      ,  , and the acceleration of the predecessor vehicle      −1,   at time
               step   . Moreover,        ,        , and         are given as



                                   1      −  ℎ       0   
                                                                       0
                                                                                    0  
                                    0  1   −                                         
                                                                       0
                                                                       
                                                                                     
                                                                                     0 
                                                                       
                                     =  0  0  −  + 1  0                =              =    .       (9)
                                                                                   0 
                                                                         
                                                                       
                                                                                        
                                                                                       
                                   0  0     0    −     + 1                              −1 
                                                                       0
                                                        −1           
               2.2.2. Action space
               Eachvehiclewithinasinglelaneplatoonfollowsthevehicleinfrontofit, andassuchtheonlyactionthevehicle
               may take to maintain a desired headway is to accelerate, or decelerate. The action for the system is defined as
               the control input      ,   to the vehicle.
               2.2.3. Reward function
               The design of a reward in a DDPG system is critical to providing good performance within the system. In
               the considered driving scenario, it is logical to minimize position error, velocity error, the amount of time
               spent accelerating and the jerkiness of the driving motion. The proposed reward thus includes the normalized
               position error,        ,  , velocity error        ,  , control input      ,   and lastly the jerk. The vehicle reward      ,   is given
   121   122   123   124   125   126   127   128   129   130   131