Page 17 - Read Online
P. 17

Lei et al. Intell Robot 2022;2(4):313­32  I http://dx.doi.org/10.20517/ir.2022.18   Page 323

               The informative directed coverage path planning is shown in Figure 5. In Figure 5A, the planner weighs dif-
               ferent objectives (e.g., rapidly covers the space as driven by DPCC or moves to the most likely area as driven
               by heat map) and optimizes the multi-objective problem to choose a preferred solution from a given set of
               optimal solutions. The preferred solution is then reflected as actions planned into the workspace, and the
               learner further improves it in a back-propagation direction. Finally, the most promising solution is deter-
               mined based on rewards, and a mortality-based coverage path is constructed accordingly. A specific example
               of the IPP is depicted in Figure 5B. The major component is a primitive path which is built with robot kine-
               matic constraints and divided into five sub-branches, each connected by two nodes. The path is constructed
               accumulatively during each iteration, and the most informative nodes are explored and selected first for the
               area with the highest probability of broiler mortality appearance. The rewards are only utilized in the selected
               nodes for optimization. The framework consists of selection, expansion, simulation, and back-propagation
               in each iteration. During selection, starting at the initial position, target nodes are recursively selected until
               an undependable node is encountered. During expansion, each expandable node is subdivided into five sub-
               branches. During simulation, the multiple-objective function is operated in each expandable node to gain a
               reward vector. Finally, during back-propagation, the rewards are backed up in each selected and expandable
               node.


               Note that the information in this paper is defined by the broiler mortality historical data for the past 30 days.
               Since historical data as a priority map also have uncertainty, we utilize the Gaussian process (GP) to transform
               them into a continuous information map. The amounts of information in each grid can be varied with the
               number of dead broilers detected. To ensure the robot’s path tends to the next planned grid, we set   (  ) as
               the average value of the current grid information at the adjacent edge, which proportionally decreases with the
               distance in the opposite direction of DCPP in the grid. More   (  ) information can be clearly found in the
               red dashed rectangle near the edge of the next grid [Figure 6A]. The planned path tends to be an optimized
               decision between the length and information gain along it, which might be conflicting in nature. Note that
               the IPP is only executed within each grid. The path obtained by DCPP becomes an information-oriented
               application in each grid, guiding the robot to the next grid. The resultant trajectory in the grid is shown in
               Figure 6B.


               The time complexity depends on the Monte Carlo Tree Search (MCTS) tree depth and the steps of each simu-
               lation. Assume in an arbitrary iteration   , the tree depth is      . Then, the number of total actions is   (      ×       ),
               where       is the number of defined primitive actions. Assume each simulation process costs   (  ), which is a
               problem-dependent value (e.g., the steps of simulation are preset depending on the simulation time budget,
               considering that longer simulations are more accurate, while shorter simulations return a result more quickly).
               Then, the complexity for one search iteration is   (      ×       +   ). If we proceed    iterations, the complexity is
                 (  (      ×       +   )). For multiple objective computation, the complexity becomes   (  (      ×    +   )) due to
                                                                                              2
                                                                                                
               required extra computation in the MCTS selection step. For practical runtime assessment, a concrete example
               is when searching 1000 iterations with five available actions on a standard laptop, MCTS takes around 0.3 s and
               multi-objective MCTS (formally ParetoMCTS) needs about 1 s. With a better computer, the runtime should
               be even better. This suffices to make a real-time decision.

               3.4. Target detection
               The target detection is to localize dead birds in real-time processing [35,36] . The dead birds are detected by the
               YOLO V4 dead bird detector [28]  [Figure 7]. The images are captured by the camera and input into the detector.
               The detector should first be well trained and developed in view of the captured images. Then, the trained
               YOLO V4 takes the acquired image as an input to convert it into a map with default grids. Predefined anchors
               are tiled onto each grid cell, and predictions of bounding boxes along with confidences and class names are
               made accordingly. With the non-maximum suppression rule, the bounding box with the highest confidence
               is retained for dead bird detection. Finally, the detection results, including indices, centroid coordinates, and
   12   13   14   15   16   17   18   19   20   21   22