Nightshade - A new data poisoning tool lets artists fight back against generative AI

ekZepp@lemmy.world · edit-2 1 year ago

Nightshade - A new data poisoning tool lets artists fight back against generative AI

possibly a cat@lemmy.ml · edit-2 1 year ago

I finally found the paper.

I’m interested to know how they fool the AI while keeping it invisible to the human eye. Do they make additional layers?

The attack is not unique to this program, they cite several works. I haven’t read the cited works but they seem to work along the lines of Carlini and Wagner’s adversarial attack. This uses minor perturbations to manipulate the classifier results.

Do they change every nth pixel?

Here is the method:

Step 3: Constructing poison images {Imagep}. For each text prompt t ∈ {Textp}, locate its natural im- age pair xt in {Image}. Choose an anchor image xa from {Imageanchor}. Given xt and xa, run the optimization of eq. (1) to produce a perturbed version x′ t = xt + δ, subject to |δ| < p. Like [19], we use LPIPS [96] to bound the perturbation and apply the penalty method [46] to solve the optimization: min δ ||F(xt + δ) − F(xa)||2 2 + α · max(LPIPS(δ) − p, 0). (2) Next, add the text/image pair t/x′ t into the poison dataset {Textp/Imagep}, remove xa from the anchor set, and move to the next text prompt in {Textp}.

Is every poisoning associated with another poisoned object?

Yes; they are targeting a single concept C for poisoning by creating a gradient in the training toward a separate, specific concept A.

Study: https://arxiv.org/abs/2310.13828v1