Towards a Scalable and Privacy-Preserving Audio Surveillance System
The human voice is one of the passive biometrics that can be used in a surveillance system to uniquely identify individuals. It allows law enforcement agencies to detect and track suspects by deploying capturing devices (such as microphones) within a certain region. To address the clear privacy concerns of such an approach, we propose an efficient way of detecting suspects in public areas—through their voices—while preserving the privacy of innocent individuals. More precisely, our approach is quite suitable for large-scale surveillance systems, where millions of recordings are analyzed every day. Our privacy-preserving model is built on top of the most accurate speaker recognition systems, and we show that the accuracy loss due to the added privacy-preserving layer is negligible. The latter employs a highly efficient cryptosystem to securely compute the similarity scores between the captured utterances and the ones stored in the suspects' database. Specifically, the system computes, for each suspect, the encrypted Probabilistic Linear Discriminant Analysis (PLDA) score and obliviously matches it against a set threshold. More importantly, we show that our computation and communication overhead is significantly lower compared to the state-of-the-art techniques, which facilitates a real-time surveillance operation. Our protocol necessitates a single round of communication between the server and the capturing device and, for a database of 100 suspects, the online computation time is only 135 ms on the capturing device and 35 ms on the server, whereas the required communication is 12 KB.
History
Journal/Conference/Book title
IEEE/ACM Transactions on Audio, Speech, and Language ProcessingPublication date
2024-12-11Version
- Post-print