The New York Times is suing OpenAI and Microsoft for copyright infringement

btp@kbin.social · 1 year ago

The New York Times is suing OpenAI and Microsoft for copyright infringement

Zima@kbin.social · edit-2 1 year ago

that’s the theory. previous models also were supposed to be doing 3 digit math but they dicovered that the questions were in the training data.

so you should look into what happens when people ask chat gpt to repeat a word forever, it prints the word for a while and then prints training data, check this link https://www.404media.co/google-researchers-attack-convinces-chatgpt-to-reveal-its-training-data/

edit: relevant part:

It also, crucially, shows that ChatGPT’s “alignment techniques do not eliminate memorization,” meaning that it sometimes spits out training data verbatim. This included PII, entire poems, “cryptographically-random identifiers” like Bitcoin addresses, passages from copyrighted scientific research papers, website addresses, and much more.

“In total, 16.9 percent of generations we tested contained memorized PII,”

I should also reiterate that I agree that the intent is to avoid memorization, but they are not successful yet.