OpenAI’s ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models

  • millie@kbin.social
    cake
    link
    fedilink
    arrow-up
    5
    ·
    1 year ago

    There’s a line here that is a little ambiguous.

    If I create a program that’s designed to learn to play video games, do I need to specifically get the consent of the developers of all games that I have legal access to? Do I need to be able to redistribute a piece of IP before I can make use of it to train an AI?

    That doesn’t seem right.

    Do I need to own a copyright before I can use Dark Reader on a webpage? To use accessibility software? Ad blockers?

    Do I need to own a piece of music in order to learn to play it? To learn about composing from it and take it as a source of inspiration?

    It seems to me that if you’re putting your content out there for all the world to see, the world seeing that through the lens of a program they wrote and making use of that experience to teach their program to understand language and visual representations ought to be within the realm of the reasonable and expected.

    We live in a world where our data is gathered sneakily on a regular basis in order to build massively invasive personality profiles on us that do us no good and make a massive profit for others. Everybody’s data is already being stolen. But this uses information that’s out there for anyone to take and hands us something of incredible value in return that gives tremendous power to individuals. It learns from us and we learn from it. Seems like a fair trade.

    LLMs are a tremendous resource that we really need to protect public access to.

    • chaogomu@kbin.social
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      The main difference is that if you as a person learn music, or play video games, or anything else where you take someone else’s work and make it your own, you are making it your own.

      ChatGPT and other AI like it only regurgitate their training data in a mishmash of almost, but not quite, nonsense. There’s no “reimagining” there’s no creativity, it’s just a literal rehash of the training data.

      It’s even in the name, the P in ChatGPT stands for “pre-trained”. It isn’t learning anything new, it’s just spitting out bits and pieces of what you originally fed it, and that is copyright infringement with extra steps.