Human and Activity Detection in Ambient Assisted Living Scenarios
Human activity recognition (HAR) is crucial in assistive technology and human-robot interaction (HRI) as it enables robots and assistive devices to understand and respond to the individual’s movements and actions, facilitating personalized assistance for those with mobility challenges or disabilities. In the context of HAR for ambient assisted living (AAL) environments, integrating additional cameras with the robot’s perspective has the potential for significant advancement of detection outcomes. However, considering the computational limits in robots, the caveat is that processing additional video streams presents challenges in terms of both computation complexity and data integration. The primary goal of this research is to create an efficient multi-view skeleton-based HAR system that optimises accuracy without sacrificing efficiency. By leveraging the strengths of the skeleton-based models and incorporating diverse perspectives, this system aims to enhance overall performance in AAL scenarios. To support this goal, an open dataset for skeletal data in HAR is developed and utilised. For objective evaluation, this work considers computational needs and algorithmic efficiency in HAR methods, exploring the potential of multi-view systems to improve human-robot interaction. This thesis is grounded on a thorough literature review including extensive dataset analysis. It explores pivotal research questions centred around the effectiveness of skeleton-based models in multi-view settings compared to image-based models. It explores the role of perspectives in multi-view HAR and investigates the optimal models for multi-view recognition. These inquiries lead to the development and evaluation of a novel lightweight and multi-view HAR architecture. This thesis significantly contributes to the field by introducing a multi-view skeleton-based dataset, dataset analysis metrics to evaluate and compare different perspectives, and a novel lightweight HAR architecture. Performance analysis supports the importance of integrating robot vision with observations from additional cameras. These results reveal variations in performance based on different views. For instance, the results highlight how the robot’s tracking of the human subject during action performance can lead to higher data collection quality in activities like climbing stairs up and down, while proximity to the subject may result in missed body joints. Conversely, other views offer a wider perspective of the scene, presenting unique advantages and challenges. In this study, integrating the additional view with the Robot-view resulted in an accuracy increase of up to 25%. Notably, the proposed skeleton-based architecture exhibits improved efficiency compared to its image-based counterpart when applied to the same dataset, demonstrating a notable 15% improvement in HAR. Moreover, the comparison between image-based and skeleton-based methods reveals that the robot movement affects them differently. Unlike image-based methods, where the movement of the robot’s camera can create confusion between the subject’s movement and the camera’s movement, the skeleton-based method is less affected by the robot’s movement. In addition, this work introduces various multi-view architectures for comparative analysis, shedding light on different data combination methodologies. The proposed system achieves high accuracy (approximately 90%), with a minimal number of training parameters (0.6M), and demonstrates significantly lower computational demands (0.00106 Giga FLOPs)compared to well-known CNN and GCN models. For instance, the ResNet models with 11.2M and MobiNet with 2.2M parameters achieved 90.9% and 83.5% accuracy, respectively. Given the pivotal role of HAR in diverse applications, the emphasis on its efficiency and effectiveness is crucial. This work not only addresses these concerns but also establishes a foundation for future research directions. The proposed skeleton-based architecture lays the groundwork for various applications such as ambient assisted living scenarios, offering a flexible platform for the development of efficient multi-person activity recognition, continual and real-time HAR, and utilisation in Human-Robot Interaction (HRI) studies.
Item Type | Article |
---|---|
Keywords | Human Activity Recognition (HAR); Skeleton-based HAR; Ambient Assisted Living (AAL); Human-Robot Interaction (HRI); Convolutional Neural Network (CNN); Human Body Pose Estimation; Efficient Deep Learning (DL) |
Date Deposited | 28 May 2025 22:26 |
Last Modified | 28 May 2025 22:26 |