• MigratingtoLemmy@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 months ago

    If OpenAI can get away with going through copy-righted material, then the answer to piracy is simple: round up a bunch of talented Devs from the internet who are writing and training AI models, and let’s make a fantastic model trained on what the internet archive has. Tell you what, let Mistral’s engineers lead that charge, and put an AGPL license on the project so that companies can’t fuck us over.

    I refuse to believe that nobody has thought of this yet

    • bandwidthcrisis@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      An AI trained on old Internet material would be like a synthetic Grandpa Simpson:

      “In my day we said ‘all your base’ and laughed all day long, because it took all day to download the video.”

  • DrCake@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 months ago

    So when’s the ruling against OpenAI and the like using the same copyrighted material to train their models

    • irotsoma@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 months ago

      But OpenAI not being allowed to use the content for free means they are being prevented from making a profit, whereas the Internet Archive is giving away the stuff for free and taking away the right of the authors to profit. /s

      Disclaimer: this is the argument that OpenAI is using currently, not my opinion.

  • HexesofVexes@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 months ago

    Ah, I see we’re burning the Library of Alexandria again… Just as with last time, the survival of texts will rely upon copies.

  • Lettuce eat lettuce@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 months ago

    Artificial scarcity at its finest. Imagine recording a song digitally, then pretending there are a limited amount of copies of that song in existence. Then you sell an agreement to another person that says they have to pretend there is only a certain made up number of copies that they bought, and if they allow more than that number of people to listen to those copies at rhe same time, they will get sued for “stealing” additional pretend copies?

    I hope everybody can see how this is the insane and pathetic result of Capitalism’s unrelenting drive to commodify everything it possibly can in the pursuit of profit.

    As always, the solution is sailing the high seas. Throughout history, those who created or saved illegal copies/translations of literature and art were important to preserving and furthering human knowledge.

    Many incredibly powerful people, empires, and countries have tried very hard to suppress that, but they keep failing. You cannot suppress the human drive for curiosity and knowledge.

    • Ming@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      True, and the fleet is big and strong. There are many people seeding hundreds of terabytes of books/research papers/etc. The knowledge will not be lost. Yarr, can’t catch me in the high seas…

  • Stern@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 months ago

    Oh sure I want to read copyright books it’s an issue, but OpenAI does it and it’s vital to their business so they can keep going.

  • masterspace@lemmy.ca
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 months ago

    Fuck Copyright.

    A system for distributing information and rewarding it’s creators should not be one based on scarcity, given that it costs nothing to copy and distribute information.

    • snooggums@midwest.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      It was fine when the limited duration was a reasonable number of years. Anything over 30 years max before being in the public domain is too long.

      • Fuzzy_Red_Panda@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        5 months ago

        Yeah. In a better world where the US court system doesn’t get weaponized and rulings aren’t delayed for years or decades, I would argue 8 to 15 years is the reasonable number, depending on the type of information being copyrighted.

    • Parabola@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      5 months ago

      If only the readme clearly said what it was with a link you could click…