fiat_lux 🆕 🏠

Relocated from: @fiat_lux@lemmy.world ⛓️‍💥(04-2026)

  • 0 Posts
  • 6 Comments
Joined 6 days ago
cake
Cake day: April 24th, 2026

help-circle
  • Link is to a shit pdf on a proton drive. It’s a basic description of the Google auction house. The prices they list are largely driven by the bids advertisers place, but that’s not to say Google doesn’t charge a bigger minimum for different demographic segments, they very much do. As does Facebook etc.

    For example, one reason that parents are worth less is because of the products they listed. Diapers cost less than business lawyers, so the margins are much slimmer, so advertisers aren’t going to bid as much for an ad placement.

    It does miss one thing that is, in my opinion, one of the more revolting aspects of their auction house. As a bidder your dollar is worth less than a big company’s dollar, even as little as one tenth. You could bid a million dollars on an ad space that Apple only bid $100001 on and you’d lose. That gap is dynamically calculated (at least in part) based on comparative search rankings.

    Here’s the text without their ad at the end:

    The Price of Free Google

    What the Ad Industry Pays to Target Americans

    A Proton Mail analysis of 54,216 advertiser-defined profiles across the U.S.

    The price of your attention

    Every user has a price

    Every Google search triggers an invisible, real-time auction where advertisers bid for access to your attention. These bids are calculated in milliseconds based on how likely you are to spend. This is how the system decides what you are worth to advertisers.

    Proton analyzed 54,216 advertiser-defined profiles across 251 U.S. cities using real ad-market pricing.

    ● Highest-value user: $17,929/year
    ● Lowest-value user: $31/year

    That’s a 577x difference. This disparity is not an anomaly — it is the business model.

    “Google doesn’t just build a profile from the information you knowingly provide. If you sign up for services, click ads, or ignore others, that creates signals the system can use to infer much more than you realize. It can start with age or interests, then expand into assumptions about income, family status, political leanings, or religion.
    When the system isn’t sure, it tests those assumptions by serving different ads, links, or recommendations and watching how you respond. It doesn’t just tracking who you are. It’s constantly learning, so it can price access to you more precisely.”
    — Eamonn Maguire, Director of Engineering, Machine Learning & AI

    Who the system values most — and least These two profiles illustrate how the same system assigns radically different value.

    $17,929/year
    ● 35–44, male
    ● Bozeman, MT
    ● Not a parent
    ● Desktop, heavy user

    High-intent, high-margin services:
    ● business lawyer
    ● home renovation
    ● golf courses

    $31/year
    ● 18–24, male
    ● Fort Smith, AR
    ● Parent
    ● Android, casual user

    Price-sensitive, lower-margin searches:
    ● cheap diapers
    ● family apartments
    ● toddler clothes

    Same system. Same country. 577x difference.

    Value is not distributed equally
    The gap between the average and the median shows that a small number of high-value users disproportionately influence the system.

    The top 10% of users generate 43% of total value.

    ● Average value: $1,605/year
    ● Median value: $760/year

    Most users are worth far less than the system’s top performers.

    How your value is calculated

    Your value is constantly recalculated

    Your value is not fixed. It is continuously recalculated based on signals that predict the likelihood of a commercially valuable action.

    These signals include:
    ● What you search
    ● When you search
    ● What device you use
    ● Who you are inferred to be

    High-intent searches — such as legal services, insurance, or financial products — command significantly higher prices than general browsing or informational queries. Your value can change from one moment to the next depending on what you do. In this system, behavior matters more than time spent

    The signals behind the price

    Your device changes your value

    Device usage has a measurable impact on how users are valued.
    ● Desktop: $2,894/year
    ● iPhone: $1,338/year
    ● Android: $585/year

    Desktop users are worth nearly 5x more than Android users — even when everything else is the same.

    These differences reflect observed behavior — including conversion rates and commercial intent — not the cost of the device itself. Your device becomes a proxy for purchasing behavior.

    Parents are systematically valued less

    Parental status affects how users are priced within the system.

    Non-parents are worth ~17% more on average.

    The gap increases during peak earning years:
    ● 25–34: +24%
    ● 35–44: +34.5%

    Having children reduces your perceived commercial value.

    Same age — same location — same device. Different value.

    Value peaks in midlife

    User value is highest between the ages of 25 and 44.

    This period corresponds with:
    ● Major financial decisions
    ● High-value purchases
    ● Career-related services

    As users age, overall value declines — but does not disappear. For users 65+, approximately 75% of value is concentrated in:

    ● Health
    ● Real estate
    ● Financial planning

    The system adapts by narrowing focus rather than reducing targeting.

    Gender is not a primary driver of value

    Gender has a measurable but limited impact on how users are priced within the ad ecosystem.

    Average values across genders are broadly similar — with differences in the single digits.

    Differences in value are driven primarily by how advertisers price categories of demand — not by gender alone. Higher-value industries — such as finance, legal services, and B2B technology — tend to influence outcomes more strongly than identity itself.

    As a result, gender can affect value indirectly, but it is not a consistent or defining factor.

    Where you live affects what you’re worth

    Local economies shape how much advertisers are willing to pay for access to users.

    Location alone can dramatically change what you’re worth.

    Highest-value markets include:

    1. Edmond, OK
    2. Bozeman, MT
    3. Naperville, IL
    4. Santa Fe, NM
    5. Durham, NC

    Lowest-value markets include:
    247. Greensboro, NC
    248. Gulfport, MS
    249. Fort Smith, AR
    250. Lowell, MA
    251. West Valley City, UT

    More usage means more value

    Frequency of use acts as a multiplier on user value.

    ● Heavy users: $3,611/year
    ● Average users: $843/year
    ● Casual users: $362/year

    Heavy users generate nearly 10x more value than casual users. More usage doesn’t just increase your value — it multiplies it.

    This creates strong incentives to maximize engagement.


  • We can see that it’s solved by the fact that AI models continue to get better despite an increasing amount of AI-generated data being present in the world that training data is being drawn from.

    Even if it logically followed that model improvement means model collapse is a solved problem, which it absolutely doesn’t, even the premise that models are improving to a significant degree is up for debate.

    MMLU pro benchmark over time line graph showing plateauing values Massive Multitask Language Understanding (MMLU) benchmark vs time 07-2023 to 01-2026

    A lot of people really want to believe that AI is going to just “go away” somehow, and this notion of model collapse is a convenient way to support that belief

    Model collapse may for some people be an argument used to support a hope that AI will go away, but the reality of that hope does not alter the validity of the model collapse problem.

    You can tell it’s not a solved problem because researchers are still trying to quantify the risk and severity of collapse - as you can see even just from the abstracts in the links I provided.

    Some choice excerpts from the abstracts, for those who don’t want to click the links:

    Our results show that even the smallest fraction of synthetic data (e.g., as little as 1% of the total training dataset) can still lead to model collapse

    …we establish … that collapse can be avoided even as the fraction of real data vanishes. On the other hand, we prove that some assumptions … are indeed necessary: Without them, model collapse can occur arbitrarily quickly, even when the original data is still present in the training set.


  • It can’t only be from data from previous generations, even if the initial demonstration used that, because that would mean a single piece of human-generated text is sufficient to avoid collapse.

    The loss of data from generation to generation is one way model collapse can occur, but it’s only one way. The actual issues that cause collapse are replication of errors and increasing data homogeneity. In a world where an unknown quantity of new data is AI generated, it is not possible to ensure only a certain quantity is used as future training data.

    Additionally, as new human generated content is based on the information provided by AI, even if not used intentionally in the construction of the text itself, the error replication and data diversity issues cross over from being only an AI-generated content problem to an all content problem. You can see examples of this happening now in the media where a journalist relies on AI output to fact check, and then the article with the error gets republished by other media outlets.

    Real AI training methods may stave off some model collapse, if we ignore existing issues around the cultural homogeneity of training data from across all time periods, or assume the models are sufficiently weighted to mitigate those issues, but it’s by no means settled that collapse is a non-problem.

    You’ve mentioned using data mixing to prevent collapse, but some of the research suggests that even iterative mixing isn’t sufficient dependent on the quantities of real vs synthetic data. Strong Model Collapse (2024), Dohmatob, Feng, Subramonian, Kempe goes into that, and since then there’s been When Models Don’t Collapse: On the Consistency of Iterative MLE (2025) Barzilai, Shamir which presents one theoretical case where collapse won’t occur provided some assumptions hold, but the math is beyond me. They also note multiple situations where near-instant collapse can occur.

    How much data poisoning might affect any of that is not at all clear, it would need to be in sufficient quantity for whatever model to have an effect, but it certainly wouldn’t help. The recent Bixonimania scandal suggests it’s feasible.


  • “model collapse” was demonstrated by repeatedly training generation after generation of models on the output of previous generations

    the best models these days are trained largely on synthetic data - data that’s been pre-processed by other AIs to turn it into stuff that makes for better training material

    You can prevent model collapse simply by enriching the training data with good data - stuff that is already archived, that can’t be “contaminated."

    This feels like an odd juxtaposition.

    If model collapse can be avoided by enriching with uncontaminated data, and model collapse comes from using training data generated by previous generations, doesn’t that imply that:

    1. Either the best models are headed towards model collapse, or,
    2. Models can’t be updated because modern data isn’t usable?


  • There are two surprising aspects of this to me. Firstly that the employees feel confident enough to express concern about Palantir’s actions in official channels. I would have thought that the nature of their work was obvious enough that this would be a cultural taboo and therefore self-censored. I guess some of them have limits to suspending disbelief for what they had likely internally framed as “work for the benefit of national security” or “job pays too well to care”.

    The second part is that not all of this official channel discussion was immediately wiped by Palantir, but perhaps they also relied on the premise of self-censorship in preventing these conversations at scale.

    Either way, I’m somewhat relieved there’s someone at Palantir worried about this at all. The more of them who are worried by this, the more leaks we’ll see.