File(s) not publicly available
Towards Collaborative Multimodal Federated Learning for Human Activity Recognition in Smart Workplace Environments
This paper aims to improve human activity recognition (HAR) with multimodal data across multiple consumer devices in a smart workplace environment. By leveraging the sensor-rich capabilities of smartphones, smartwatches, and smart speakers, we propose Collaborative Multimodal Federated Learning (CoMFL) algorithm to facilitate efficient feature encoding on lightweight local models implemented on consumer devices while fusing these encoded features for training a super model on a personalized local server, all within the private zone of a user. Federated learning aggregates model updates across users without compromising privacy, resulting in a generalized global super model. Additionally, we address the challenge of missing modality by incorporating a feature reconstruction network. This network attempts to reconstruct missing modalities prior to feature fusion, improving performance when dealing with missing features. Our proposed CoMFL achieves significant performance gains with multimodal HAR systems.