• De_Narm@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    5
    ·
    8 days ago

    That’s a problem across the board. Assuming AI does establish itself, all it’s training data dries up and we basically stagnate.

    Also, in this weird inbetween phase until it is actually good, we’ve already generated so much bullshit that AI trains on the hallucinations of other AIs.

    • Lexxly@lemmy.ca
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      8 days ago

      As some point we will train it on live data similar to how human babies are trained. There’s always more data.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        7 days ago

        And the AIs themselves can generate data. There have been a few recent news stories about AIs doing novel research, that will only become more prevalent over time.

        • 8uurg@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          7 days ago

          Though, a big catch is that whatever is generated needs to be verified. The most recent story I’ve seen was the AI proposing the hypothesis of a particular drug increasing antigen presentation, which could turn cold tumors (those the immune system does not attack) into hot tumors (those the immune system does attack). The key news here is that this hypothesis was found to be correct, as an experiment has shown that said drug does have this effect. (link to Google’s press release)

          The catch here is that I have not seen any info on how many hypotheses were generated to find this correct hypothesis. It doesn’t have to be perfect: research often causes a hypothesis to be rejected, even if proposed by a person rather than AI. However signal-to-noise is still important for how game changing it will be. Like in this blogpost it can fail to identify a solution at all, or even return incorrect hypotheses. You can’t simply use this data for further training the LLM, as it would only degrade the performance.

          There needs to be a verification and filtering first. Wikipedia has played such a role for a very long time, where editors reference sources, and verify the trustworthiness of these sources. If Wikipedia goes under because of this, either due to a lack of funding or due to a lack of editors, a very important source will be lost.