Singapore Institute of Technology
Browse

Poisoning-Free Defense Against Black-Box Model Extraction

Download (1.73 MB)
conference contribution
posted on 2024-09-16, 04:55 authored by Haitian Zhang, Guang HuaGuang Hua, Wen Yang

Recent research has shown that an adversary can use a surrogate model to steal the functionality of a target deep learning model even under the black-box condition and without data curation, while the existing defense mainly relies on API poisoning to disturb the surrogate training. Unfortunately, due to poisoning, the defense is achieved at the price of fidelity loss, sacrificing the interests of honest users. To solve this problem, we propose an Adversarial Fine-Tuning (AdvFT) framework, incorporating the generative adversarial network (GAN) structure that disturbs the feature representations of out-of-distribution (OOD) queries while preserving those of in-distribution (ID) ones, circumventing the need for OOD sample collection and API poisoning. Extensive experiments verify the effectiveness of the proposed framework. Code is available at github.com/Hatins/AdvFT.

History

Journal/Conference/Book title

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Publication date

2024-04-14

Version

  • Post-print

Rights statement

© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Corresponding author

Guang Hua

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC