Follow-Bench: A Unified Motion Planning Benchmark for Socially-Aware Robot Person Following

Abstract

Robot person following (RPF)---mobile robots that follow and assist a specific person---has emerging applications in personal assistance, security patrols, eldercare, and logistics. To be effective, such robots must follow the target while ensuring safety and comfort for both the target and surrounding people. In this work, we present the first comprehensive study of RPF, which (i) surveys representative scenarios, motion-planning methods, and evaluation metrics with a focus on safety and comfort; (ii) introduces Follow-Bench, a unified benchmark simulating diverse scenarios, including various target trajectory patterns, crowd dynamics, and environmental layouts; and (iii) re-implements six representative RPF planners, ensuring that both safety and comfort are systematically considered. Moreover, we evaluate the two best-performing planners from our benchmark on a differential-drive robot to provide insights into real-world deployment of RPF planners. Extensive simulation and real-world experiments provide quantitative study of the safety-comfort trade-offs of existing planners, while revealing open challenges and future research directions.

Review

We focus on two critical and quantifiable requirements that are frequently prioritized: safety and comfort:

  • Safety: An objective requirement, ensuring that the robot avoids collisions while maintaining continuous observation of the target person.
  • Comfort: In contrast, this is more subjective and is typically reflected in the robot's motion patterns and the relative distances between the robot, the target person, and surrounding pedestrians (i.e., proxemics).

Guided by these two requirements, we comprehensively review RPF-related scenarios that challenge safety and comfort, and the evaluation metrics used to assess performance with respect to these requirements. Furthermore, the reviewed RPF planners are analyzed with an emphasis on how they are designed to address these two key requirements. To enable systematic performance evaluation, we introduce Follow-Bench, a unified benchmark for evaluating RPF planners under diverse conditions, including various target trajectory patterns, pedestrian-flow patterns, and environmental layouts. We re-implement six popular RPF planners, ensuring that both safety and comfort are systematically considered, including: MPC-based, MPC w/ Traj., MPC w/ DS., SFM-based, DWA-based and DWA w/ Traj.

Benchmark Results

Real-world Experiments

Challenges and Future Directions

  • Balancing Trade-offs between Safety and Comfort: Most RPF planners struggle with the trade-off between keeping a socially comfortable distance to the target and avoiding collisions, especially in crowded or cluttered environments. Fixed point/trajectory tracking often leads to occlusions and safety risks, while optimization-based planners face computational challenges in dynamic settings. Promising directions include (i) adaptive trajectory prediction that integrates human motion forecasts and visibility constraints, (ii) convexified or sampling-based MPC to reduce real-time complexity, and (iii) hierarchical planning that separates high-level candidate generation from low-level collision-free execution.
  • Improving Spatial-temporal Obstacle Representation, Prediction and Planning: Current polygon-based clustering works for discrete clutter but struggles with continuous structures (e.g., walls, doorways) and may merge targets with nearby obstacles. Future work should develop more robust obstacle representations and adaptive safe distances for different spatial contexts. Incorporating not only target trajectories but also predictions of surrounding pedestrians—using advanced models like Social-GAN—can enable smoother, socially aware planning that balances visibility, safety, and proxemics.
  • Discovering and Utilizing Environmental Patterns: Beyond short-term trajectory prediction, RPF can benefit from recognizing long-term crowd and environmental patterns. For example, following pedestrian flow at crosswalks or detouring around dense formations allows safer, more comfortable re-approach to the target. A promising direction is to integrate high-level pattern recognition with low-level planning to balance visibility, safety, and comfort in complex environments.
  • Towards More Expressive Intermediate Representations: Point- or trajectory-based references often fail under occlusions, while cost maps and potential fields lack distance encoding and are costly to update. ESDFs provide smooth, distance-aware planning but remain computationally heavy in dynamic settings. Future directions include designing intermediate representations that jointly encode visibility, safety, and social constraints—potentially using learning-based methods for compact and adaptive modeling.