Page 12 - Read Online
P. 12

Liu et al. Art Int Surg 2024;4:92-108  https://dx.doi.org/10.20517/ais.2024.19                                                                  Page 96




































                Figure 2. Overlay of original images (left) with corresponding outputs of human detection (middle) and HMR models (right) across
                different stages of the simulated surgery. HMR: Human mesh recovery.

               (1) Movement patterns and distance traversal, in accordance with the association between movements and
               surgical stage transitions;
               (2) Changes in positioning relative to the room due to the collective emphasis on optimal OR layout and
               minimization of collision points to facilitate smooth flow patterns ;
                                                                      [5]
               (3) Visual attention switches over time due to the importance of task focus in the OR .
                                                                                      [22]

               Movements and positioning
               To analyze movement patterns and changes in positioning, we approximated the position of each mesh in
               each frame by the predicted pelvis joint of the mesh. Using these estimated positions, we constructed OR
               flow maps and associated heat maps to visualize tracklet trajectories and compute statistics on movement
               patterns, such as cumulative distance traversed.


               Visual attention
               To compute the visual attention field of an individual i at timestep t, we firstly calculated the midpoint
               µ ∈ ℝ  between the left eye joint L ∈ ℝ  and right eye joint R ∈ ℝ . With neck joint N ∈ ℝ , we then
                                                                                                 3
                    3
                                                   3
                                                                           3
                                      3
               constructed a plane P ⊂ ℝ  that was perpendicular to L, R, and N while simultaneously anchored by µ.
               Finally, we normalized across the resultant plane to obtain our view direction v  ∈ ℝ . To detect a potential
                                                                                       3
                                                                                  t
                                                                                  i
                                             t2
                                                                                                         t1
               attention switch at timestep t2, AS  ∈{0, 1}, we measured cosine similarity between viewing directions v
                                            i
                                                                                                         i
               and v  where t1 and t2 represent timesteps with a difference of one-third of a second. We defined a switch
                    t2
                    i
               in attention as a direction change of more than 45 degrees.
               Recovering actions from mesh sequences
               To demonstrate the utility of our HMR framework to downstream surgical prediction tasks that rely on a
               physical understanding of the scene, we trained and evaluated a deep learning model to perform a multi-
   7   8   9   10   11   12   13   14   15   16   17