Researchers published a massive database of more than 2 billion Discord messages that they say they scraped using Discord’s public API. The data was pulled from 3,167 servers and covers posts made between 2015 and 2024, the entire time Discord has been active.

Though the researchers claim they’ve anonymized the data, it’s hard to imagine anyone is comfortable with almost a decade of their Discord messages sitting in a public JSON file online. Separately, a different programmer released a Discord tool called “Searchcord” based on a different data set that shows non-anonymized chat histories.

    • CosmicTurtle0@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      2 days ago

      I skimmed through their paper and I can’t seem to find the instructions to download the dataset.

      I found this particularly cute:

      This study introduces the Discord Unveiled Dataset, a comprehensive and ethically curated resource encompassing over 3,000 public servers and 2 billion messages exchanged on Discord.