Physical Distancing and Mask Wearing Behavior Dataset Generator from CCTV Footages Using YOLOv8

Document Type

Conference Proceeding

Publication Date



Computer simulations using agent-based approach aimed at modeling human behavior require a robust dataset derived from actual observation to serve as ground truth. This paper details an approach for developing a movement behavior dataset generator from CCTV footages with respect to two health-related behaviors: face mask wearing and physical distancing, while addressing the privacy concerns of confidential CCTV data. A two-stage YOLOv8-based cascaded approach was implemented for object tracking and detection. The first stage involves tracking of individuals in the video feed to determine physical distancing behavior using the pre-trained YOLOv8 xLarge model paired with Bot-SORT multi-object tracker and OpenCV Perspective-n-Point pose estimation. The second stage involves determining the mask wearing behavior of the tracked individuals using the best-performing model among the five YOLOv8 models (nano, small, medium, large, and xLarge), each trained for 50 epochs on a custom CCTV dataset. Results show that the custom-trained xLarge model performed the best on the mask detection task with the following metric scores: mAP50 = 0.94; mAP50-95 = 0.63; and F1 = 0.872. The faces of all the tracked individuals are blurred-out in the resulting video frames to preserve the privacy of the CCTV data. Finally, the developed system is able to generate the corresponding mask-distancing behavior dataset and annotated output videos from the input CCTV raw footages.