Poisoning-Free Defense Against Black-Box Model Extraction
Recent research has shown that an adversary can use a surrogate model to steal the functionality of a target deep learning model even under the black-box condition and without data curation, while the existing defense mainly relies on API poisoning to disturb the surrogate training. Unfortunately, due to poisoning, the defense is achieved at the price of fidelity loss, sacrificing the interests of honest users. To solve this problem, we propose an Adversarial Fine-Tuning (AdvFT) framework, incorporating the generative adversarial network (GAN) structure that disturbs the feature representations of out-of-distribution (OOD) queries while preserving those of in-distribution (ID) ones, circumventing the need for OOD sample collection and API poisoning. Extensive experiments verify the effectiveness of the proposed framework. Code is available at github.com/Hatins/AdvFT.
History
Journal/Conference/Book title
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)Publication date
2024-04-14Version
- Post-print