- cross-posted to:
- technews@blendit.bsd.cafe
- cross-posted to:
- technews@blendit.bsd.cafe
‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products
Except they literally don’t. Human memory doesn’t retain an exact copy of things. Very good isn’t the same as exactly. And human beings can’t grab everything they see and instantly use it.
Machine learning doesn’t retain an exact copy either. Just how on earth do you think can a model trained on terabytes of data be only a few gigabytes in side, yet contain “exact copies” of everything? If “AI” could function as a compression algorithm, it’d definitely be used as one. But it can’t, so it isn’t.
Machine learning can definitely re-create certain things really closely, but to do it well, it generally requires a lot of repeats in the training set. Which, granted, is a big problem that exists right now, and which people are trying to solve. But even right now, if you want an “exact” re-creation of something, cherry picking is almost always necessary, since (unsurprisingly) ML systems have a tendency to create things that have not been seen before.
Here’s an image from an article claiming that machine learning image generators plagiarize things.
However, if you take a second to look at the image, you’ll see that the prompters literally ask for screencaps of specific movies with specific actors, etc. and even then the resulting images aren’t one-to-one copies. It doesn’t take long to spot differences, like different lighting, slightly different poses, different backgrounds, etc.
If you got ahold of a human artist specializing in photoreal drawings and asked them to re-create a specific part of a movie they’ve seen a couple dozen or hundred times, they’d most likely produce something remarkably similar in accuracy. Very similar to what machine learning images generators are capable of at the moment.