Excerpt:
“Even within the coding, it’s not working well,” said Smiley. “I’ll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven’t engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence.”
Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.
“We don’t know what those are yet,” he said.
One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That’s the kind of thing that needs to be assessed to determine whether AI helps an organization’s engineering practice.
To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.
“It passed all the unit tests, the shape of the code looks right,” he said. It’s 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It’s a dumpster fire. Throw it away. All that money you spent on it is worthless."
All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.
“Coding works if you measure lines of code and pull requests,” he said. “Coding does not work if you measure quality and team performance. There’s no evidence to suggest that that’s moving in a positive direction.”
Yeah these newer systems are crazy. The agent spawns a dozen subagents that all do some figuring out on the code base and the user request. Then those results get collated, then passed along to a new set of subagents that make the actual changes. Then there are agents that check stuff and tell the subagents to redo stuff or make changes. And then it gets a final check like unit tests, compilation etc. And then it’s marked as done for the user. The amount of tokens this burns is crazy, but it gets them better results in the benchmarks, so it gets marketed as an improvement. In reality it’s still fucking up all the damned time.
Coding with AI is like coding with a junior dev, who didn’t pay attention in school, is high right now, doesn’t learn and only listens half of the time. It fools people into thinking it’s better, because it shits out code super fast. But the cognitive load is actually higher, because checking the code is much harder than coming up with it yourself. It’s slower by far. If you are actually going faster, the quality is lacking.
I code with AI a good bit for a side project since I need to use my work AI and get my stats up to show management that I’m using it. The “impressive” thing is learning new softwares and how to use them quickly in your environment. When setting up my homelab with automatic git pull, it quickly gave me some commands and showed me what to add in my docker container.
Correcting issues is exactly like coding with a high junior dev though. The code bloat is real and I’m going to attempt to use agentic AI to consolidate it in the future. I don’t believe you can really “vibe code” unless you already know how to code though. Stating the exact structures and organization and whatnot is vital for agentic AI programming semi-complex systems.
This is very different from my experience, but I’ve purposely lagged behind in adoption and I often do things the slow way because I like programming and I don’t want to get too lazy and dependent.
I just recently started using Claude Code CLI. With how I use it: asking it specific questions and often telling it exactly what files and lines to analyze, it feels more like taking to an extremely knowledgeable programmer who has very narrow context and often makes short-sighted decisions.
I find it super helpful in troubleshooting. But it also feels like a trap, because I can feel it gaining my trust and I know better than to trust it.
I’ve mentioned the long-term effects I see at work in several places, but all I can say is be very careful how you use it. The parts of our codebase that are almost entirely AI written are unreadable garbage and a complete clusterfuck of coding paradigms. It’s bad enough that I’ve said straight to my manager’s face that I’d be embarassed to ship this to production (and yes I await my pink slip).
As a tool, it can help explain code, it can help find places where things are being done, and it can even suggest ways to clean up code. However, those are all things you’ll also learn over time as you gather more and more experience, and it acts more as a crutch here because you spend less time learning the code you’re working with as a result.
I recommend maintaining exceptional skepticism with all code it generates. Claude is very good at producing pretty code. That code is often deceptive, and I’ve seen even Opus hallucinate fields, generate useless tests, and misuse language/library features to solve a task.
checking the code is much harder than coming up with it yourself
That’s always been true. But, at least in the past when you were checking the code written by a junior dev, the kinds of mistakes they’d make were easy to spot and easy to predict.
LLMs are created in such a way that they produce code that genuinely looks perfect at first. It’s stuff that’s designed to blend in and look plausible. In the past you could look at something and say “oh, this is just reversing a linked list”. Now, you have to go through line by line trying to see if the thing that looks 100% plausible actually contains a tiny twist that breaks everything.
It’s like guiding a coked up junior who can write 5000 wpm, has read every piece of documentation ever without understanding any of it.
AI is a solution in search of a problem. Why else would there be consultants to “help shepherd organizations towards an AI strategy”? Companies are looking to use AI out of fear of missing out, not because they need it.
deleted by creator
The problem is that code is hard to write. AI just doesn’t solve it. This is opposite of crypto, where the product is sort of good at what it does, (not bitcoin, though), but we don’t actually need to do that.
AI is a solution in search of a problem.
The problem being CEOs asking themselves, “how do we acquire labour without having to pay for said labour, in order to maximize our own profit margins?”
AI was always meant to allow wealth to access labour without allowing labour to access wealth.
I, for one, am designing an entire production line of guillotines for when our capitalist system finally collapses. And for those in bunkers: a way of discovering air exchangers and all emergency exits so they can be filled with cement to turn bunkers into tombs. We need an effective method of culling sociopaths from our civilization, after all.
When I entered the workforce in the late '90s, people were still saying this about putting PCs on every employee’s desk. This was at a really profitable company. The argument was they already had telephones, pen and paper. If someone needed to write something down, they had secretaries for that who had typewriters. They had dictating machines. And Xerox machines.
And the truth was, most of the higher level employees were surely still more profitable on the phone with a client than they were sitting there pecking away at a keyboard.
Then, just a handful of years later, not only would the company have been toast had it not pushed ahead, but was also deploying BlackBerry devices with email, deploying laptops with remote access capabilities to most staff, and handheld PDAs (Palm pilots) to many others.
Looking at the history of all of this, sometimes we don’t know what exactly will happen with newish tech, or exactly how it will be used. But it’s true that the companies that don’t keep up often fall hopelessly behind.
If AI is so good at what it does, then it shouldn’t matter if you fall behind in adopting it… it should be able to pick up from where you need it. And if it’s not mature, there’s an equally valid argument to be made for not even STARTING adoption until it IS - early adopters always pay the most.
There’s practically no situation where rushing now makes sense, even if the tech eventually DOES deliver on the promise.
Yes but counterpoint: give me your money.
… or else something bad might happen to you? Sadly this seems the intellectual level that the discussion is at right now, and corporate structure being authoritarian, leans towards listening to those highest up in the hierarchy, such as Donald J. Trump.
“Logic” has little to do with any of this. The elites have spoken, so get to marching, NOW.
It makes sense for the tech companies to be rushing AI development because they want to be the only one people use. They want to be the Amazon of AI.
A ton of tech companies operate like that. They pump massive investments into projects because they see a future where they have the monopoly and will get their investments out a hundred fold.
The users should be a lot more wary though.
“But the fact that some geniuses were laughed at does not imply that all who are laughed at are geniuses. They laughed at Columbus, they laughed at Fulton, they laughed at the Wright brothers. But they also laughed at Bozo the Clown.”
— Carl Sagan
I think that’s called a cargo cult. Just because something is a tech gadget doesn’t mean it’s going to change the world.
Basically, the question is this: If you were to adopt it late and it became a hit, could you emulate the technology with what you have in the brief window between when your business partners and customers start expecting it and when you have adapted your workflow to include it?
For computers, the answer was no. You had to get ahead of it so companies with computers could communicate with your computer faster than with any comptetitors.
But e-mail is just a cheaper fax machine. And for office work, mobile phones are just digital secretaries+desk phones. Mobile phones were critical on the move, though.
Even if LLMs were profitable, it’s not going to be better at talking to LLMs than humans are. Put two LLMs together and they tend to enter hallucinatory death spirals, lose their sense of identity, and other failure modes. Computers could rely on a communicable standards, but LLMs fundamentally don’t have standards. There is no API, no consistent internal data structure.
If you put in the labor to make a LLM play nice with another LLM, you just end up with a standard API. And yes, it’s possible that this ends up being cheaper than humans, but it does mean you lose out on nothing by adapting late when all the kinks have been worked out and protocols have been established. Just hire some LLM experts to do the transfer right the first time.
Even if LLMs were profitable, it’s not going to be better at talking to LLMs than humans are.
LLMs don’t need to be better. They just need to be more profitable. And wages are very expensive. Doesn’t matter if they lose a couple of customers when they can reduce cost.
It is all part of the enshittification of the company and for the enrichment of the shareholders.
Except LLMs aren’t profitable. They’re propped up by venture capital on the one hand and desperately integrated into systems with no case study on the effects on profit on the other. Video game CEOs are surprised and appalled when gamers turn against AI, implying they did literally no market research before investing billions.
When venture capital dries up and companies have to bear the full cost of LLMs themselves - or worse: if LLM companies go bankrupt and their API goes dead - any company that adopted LLMs into their workflow is going to suffer tremendously. Imagine if they fired half their employees because the LLM does that work and then the LLM stops working. So even if you could lose some money this quarter to invest in it and maybe gain some back by the end of this year, several years from now the company could be under existential threat.
And again, it can be acceptable to take this sort of risk if the technology is one you might at some point not be able to serve customers and business partners without. But LLMs and genAI are not that sort of technology. Maybe business partners will hate you if you don’t go along with the buzzword mania, but then you should fake it and allow it to cause as little damage as it can.
It is all part of the enshittification of the company
A company that adopts LLMs is not enshittifying, it is setting itself up to be a victim of LLM enshittification.
and for the enrichment of the shareholders.
Shareholders would be richer in the short term if they didn’t waste money investing in LLM adoption, and much richer in the long term if they were one of the few companies that doesn’t go bankrupt when the LLM bubble pops.
The purpose of LLM adoption is to weaken the social-political position of workers, to create an extra rival to break their collective bargaining power even if it costs capital unfathomable amounts of money. Like when capitalists oppose universal basic income despite it massively increasing their profit margins if it were adopted because workers wouldn’t get sick as often, capitalists are fully capable of acting in solidarity with each other for purposes of class warfare, even if it comes at a personal loss.
Nah, it is more that LLMs are a neat technology that allows computers to generate stuff on their own. Which has all sort of uses. It has solved the problem of typing big texts on your own. (read: it did not solve the problem of reviewing big texts)
But it has also gaslit managers into thinking it can do much more than its capabilities, so they demand it to be put into everything. With disastrous results.
Generative models, which many people call “AI”, have a much higher catastrophic failure rate than we have been lead to believe. It cannot actually be used to replace humans, just as an inanimate object can’t replace a parent.
Jobs aren’t threatened by generative models. Jobs are threatened by a credit crunch due to high interest rates and a lack of lenders being able to adapt.
“AI” is a ruse, a useful excuse that helps make people want to invest, investors & economists OK with record job loss, and the general public more susceptible to data harvesting and surveillance.
We never figured out good software productivity metrics, and now we’re supposed to come up with AI effectiveness metrics? Good luck with that.
Sure we did.
“Lines Of Code” is a good one, more code = more work so it must be good.
I recently had a run in with another good one : PR’s/Dev/Month.
Not only it that one good for overall productivity, it’s a way to weed out those unproductive devs who check in less often.
This one was so good, management decided to add it to the company wide catchup slides in a section espousing how the new AI driven systems brought this number up enough to be above other companies.
That means other companies are using it as well, so it must be good.
Why is it always the dumbest people who become managers?
The others are busy working, they don’t have time to waste drinking coffee with execs
Lmfao
Deeks said “One of our friends is an SVP of one of the largest insurers in the country and he told us point blank that this is a very real problem and he does not know why people are not talking about it more.”
Maybe because way too many people are making way too much money and it underpins something like 30% of the economy at this point and everyone just keeps smiling and nodding, and they’re going to keep doing that until we drive straight off the fucking cliff 🤪
But who’s making money? All the AI corps are losing billions, only the hardware vendors are making bank.
Makers of AI lose money and users of AI probably also lose since all they get is shit output that requires more work.
Investors
Investors
Specifically suckers. Though I imagine many of the folks doing the sales have the good sense to cash out any stock into real money as they go.
Recently had to call out a coworker for vibecoding all her unit tests. How did I know they were vibe coded? None of the tests had an assertion, so they literally couldn’t fail.
Vibe coding guy wrote unit tests for our embedded project. Of course, the hardware peripherals aren’t available for unit tests on the dev machine/build server, so you sometimes have to write mock versions (like an “adc” function that just returns predetermined values in the format of the real analog-digital converter).
Claude wrote the tests and mock hardware so well that it forgot to include any actual code from the project. The test cases were just testing the mock hardware.
Not realizing that should be an instant firing. The dev didn’t even glance a look at the unit tests…
if you reject her pull requests, does she fix it? is there a way for management to see when an employee is pushing bad commits more frequently than usual?
That’s weird. I’ve made it write a few tests once, and it pretty much made them in the style of other tests in the repo. And they did have assertions.
Trust with verification. I’ve had it do everything right, I’ve had it do thing so incredibly stupid that even a cursory glance at the could would me more than enough to /clear and start back over.
claude code is capable of producing code and unit tests, but it doesn’t always get it right. It’s smart enough that it will keep trying until it gets the result, but if you start running low on context it’ll start getting worse at it.
I wouldn’t have it contribute a lot of code AND unit tests in the same session. new session, read this code and make unit tests. new session read these unit tests, give me advice on any problems or edge cases that might be missed.
To be fair, if you’re not reading what it’s doing and guiding it, you’re fucking up.
I think it’s better as a second set of eyes than a software architect.
I think it’s better as a second set of eyes than a software architect.
A rubber ducky that talks back is also a good analogy for me.
I wouldn’t have it contribute a lot of code
Yeah, I tried that once, for a tedious refactoring. It would’ve been faster if I did it myself tbh. Telling it to do small tedious things, and keeping the interesting things for yourself (cause why would you deprive yourself of that …) is currently where I stand with this tool
and keeping the interesting things for yourself (cause why would you deprive yourself of that …
I fear that will be required at some point. It’s not always good at writing code, but it does well enough that it can turn a seasoned developer into a manager. :/
My company is pushing LLM code assistants REALLY hard (like, you WILL use it but we’re supposedly not flagging you for termination if you don’t… yet). My experience is the same as yours - unit tests are one of the places where it actually seems to do pretty good. It’s definitely not 100%, but in general it’s not bad and does seem to save some time in this particular area.
That said, I did just remove a test that it created that verified that
IMPORTED_CONSTANT === localUnitTestConstantWithSameHardcodedValue. It passed ; )
Hahaha 🤣
Yeah, it’s a bad idea to let AI write both the code and the tests. If nothing else, at least review the tests more carefully than everything else and also do some manual testing. I won’t normally approve a PR unless it has a description of how it was tested with preferably some screenshots or log snippets.
Had a vibe coder who couldnt code himself a user authentication check (salted password sha hash) on a login screen.
This is all fine and dandy but the whole article is based on an interview with “Dorian Smiley, co-founder and CTO of AI advisory service Codestrap”. Codestrap is a Palantir service provider, and as you’d expect Smiley is a Palantir shill.
The article hits different considering it’s more or less a world devourer zealot taking a jab at competing world devourers. The reporter is an unsuspecting proxy at best.
People will upvote anything if it takes a shot at AI. Even when the subtitle itself is literally an ad.
Codestrap founders say we need to dial down the hype and sort through the mess
The cult mentality is really interesting to watch.
Keep replying! Maybe this is a good honeypot for stupid people. “I hate you!!” Lmao
I can hate more than one thing at a time. AI, Palantir and you for being so pretentious.
Me: This is an ad, it’s crazy that people will engage in something that’s clearly an ad, they’re feeding right into it. It’s a cult mentality.
You: I hate you!! SCREEEE
You couldn’t have proved my point more. Someone even upvoted you because it was a shot at AI. The cult is so strong you can’t even tell you’re in it.
I’m glad you have an outlet for your impotent rage, but do you have to be so pathetic? Your mental age is showing.
I’ll take pretentious though, because I am better than you.
I love this bit especially
Insurers, he said, are already lobbying state-level insurance regulators to win a carve-out in business insurance liability policies so they are not obligated to cover AI-related workflows. “That kills the whole system,” Deeks said. Smiley added: “The question here is if it’s all so great, why are the insurance underwriters going to great lengths to prohibit coverage for these things? They’re generally pretty good at risk profiling.”
Businesses were failing even before AI. If I cannot eventually speak to a human on a telephone then the whole human layer is gone and I no longer want to do business with that entity.
Guy selling ai coding platform says other AI coding platforms suck.
This just reads like a sales pitch rather than journalism. Not citing any studies just some anecdotes about what he hears “in the industry”.
Half of it is:
You’re measuring the wrong metrics for productivity, you should be using these new metrics that my AI coding platform does better on.
I know the AI hate is strong here but just because a company isn’t pushing AI in the typical way doesn’t mean they aren’t trying to hype whatever they’re selling up beyond reason. Nearly any tech CEO cannot be trusted, including this guy, because they’re always trying to act like they can predict and make the future when they probably can’t.
My take exactly. Especially the bits about unit tests. If you cannot rely on your unit tests as a first assessment of your code quality, your unit tests are trash.
And not every company runs GitHub. The metrics he’s talking about are DevOps metrics and not development metrics. For example In my work, nobody gives a fuck about mean time to production. We have a planning schedule and we need the ok from our customers before we can update our product.
People delude themselves if they think LLMs are not useful for coding. People also delude themselves that all code will be AI written in the next 2 years. The reality is that it’s incredibly useful tool but with reasonable limits.
I think part of it is that it’s been overhyped for so long. But now Opus can actually do all the shit we were promised 2 years ago.
I keep trying to use the various LLMs that people recommend for coding for various tasks and it doesn’t just get things wrong. I have been doing quite a bit of embedded work recently and some of the designs it comes up with would cause electrical fires, its that bad. Where the earlier versions would be like “oh yes that is wrong let me correct it…” then often get it wrong again the new ones will confidently tell you that you are wrong. When you tell them it set on fire they just don’t change.
I don’t get it I feel like all these people claiming success with them are just not very discerning about the quality of the code it produces or worse just don’t know any better.
It is possible to get good results, the problem is that you yourself need to have an very good understanding of the problem and how to solve it, and then accurately convey that to the AI.
Granted, I don’t work on embedded and I’d imagine there’s less code available for AI to train on than other fields.
Yes, I definitely want to train a new hire who is superlatively confident that they are correct, while also having to do my job correctly as well, while said new hire keeps putting shit in my work.
Lowkey I think anyone saying LLMs are useful for work is telling everyone around them their job is producing mostly low quality work and could reasonably be cut.
I’ve seen this at work as well. The initial internal bot we had would give pretty decent info, would have sources, would say “I don’t have access to that” etc. Now it always gives plausible sounding answers. It uses sources that do not back up its conclusions. Then if I tell it the source does not say that, it will say it doesn’t know why it said that, that the answer “felt” correct. It was useful as a search engine but now not even that
So is this just early adaptation problems? Or are we starting to find the ceiling for Ai?
The “ceiling” is the fact that no matter how fast AI can write code, it still needs to be reviewed by humans. Even if it passes the tests.
As much as everyone thinks they can take the human review step out of the process with testing, AI still fucks up enough that it’s a bad idea. We’ll be in this state until actually intelligent AI comes along. Some evolution of machine learning beyond LLMs.
We just need another billion parameters bro. Surely if we just gave the LLMs another billion parameters it would solve the problem…

That’s actually three 0s too short, at the very least
Nah, that’s the weekly interaction.
(Ok, in reality, this meme is 3 or 4 years old by now. Back then, he was asking for those absurd numbers yearly.)
Now it gets overshadowed by him saying “but humans also need resources to grow”.
One smoldering Earth later….
something i keep thinking about: is the electricity and water usage actually cheaper than a human? i feel like once the vc money dries up the whole thing will be incredibly unsustainable.
We’ll be in this state until actually intelligent AI comes along. Some evolution of machine learning beyond LLMs.
Yep. The methodology of LLMs is effectively an evolution of Markov chains. If someone hadn’t recently change the definition of AI to include “the illusion of intelligence” we wouldn’t be calling this AI. It’s just algorithmic with a few extra steps to try keep the algorithm on-topic.
These types.of things, we have all the time in generative algorithms. I think LLMs being more publicly seen is why someone started calling it AI now.
So we’ve basically hit the ceiling straight out of the gate and progress is not quicker or slower. We’ll have another step forward in predictive algorithms in the future, but not now. It’s usually a once a decade thing and varies in advancement.
Edit: I have to point out that I initially had hope that this current iteration of “genAI” would be a very useful tool in advancing us to actual AI faster, but, no. It seems the issues of “hallucination”—which are a built-in unavoidable issue with predictive algorithms trained on unfiltered mass—is not very capable. The university I work at, we’ve been trying different things for the past two years, and so far there seems to be no hope. However, genAI is good at summarising mass outputs of our normal AI, which can produce a lot to comb through, but anything the genAI interpretats still needs double-checked despite closed off training.
It’s been unsurprisingly disappointing.
We’re still at a point where logic is done with the same old method of mass iterations. Training is slow and complex. genAI relies on being taught logic that already exists, not being able to thoroughly learn it’s own. There is no logic in predictive algorithms outside of the algorithm itself, and they’re very logically closed and defined.
Of course LISP machines didn’t crash the hardware market and make up 50 % of the entire economy. Other than that it’s, as Shirley Bassey put it, all just a little bit of history repeating.
People have been trying to call things “AI” for at least the last half century (with varying degrees of success). They were chomping at the bit for this before most of us here were even alive.
We are at end-stage capitalism and things other than scientific discoveries and technological engineering marvels are driving the show now. Money is made regardless of reality, and cultural shifts follow the money. Case in point: we too here are calling this “AI”.
I realized the fundamental limitation of the current generation of AI: it’s not afraid of fucking up. The fear of losing your job is a powerful source of motivation to actually get things right the first time.
And this isn’t meant to glorify toxic working environments or anything like that; even in the most open and collaborative team that never tries to place blame on anyone, in general, no one likes fucking up.
So you double check your work, you try to be reasonably confident in your answers, and you make sure your code actually does what it’s supposed to do. You take responsibility for your work, maybe even take pride in it.
Even now we’re still having to lean on that, but we’re putting all the responsibility and blame on the shoulders of the gatekeeper, not the creator. We’re shooting a gun at a bulletproof vest and going “look, it’s completely safe!”
fear of losing your job is a powerful source of motivation
I just feel good when things I make are good so I try to make them good. Fear is a terrible motivator for quality
So you double check your work, you try to be reasonably confident in your answers, and you make sure your code actually does what it’s supposed to do. You take responsibility for your work, maybe even take pride in it.
In my experience, around 50% of (professional) developers do not take pride in their work, nor do they care.
In my experience, around 50% of (professional) developers do not take pride in their work, nor do they care.
I agree. And in my experience, that 50% have been the quickest and most eager to add LLMs to their workflow.
And when they do, the quality of their code goes up
I agree we’re better off firing them, but I’m not their manager and I do appreciate stuff with less memory leaks and SQL injections
The amount of their output goes up. More importantly, they excrete code faster than good developers equipped with AI, simply because they don’t bother to review generated code. So now they are seen as top performers instead of always lagging behind like it was before AI.
Whether it actually results in better code is debatable, especially in the long run.
Its early adoption problems in the same way as putting radium in toothpaste was. There are legitimate, already growing uses for various AI systems but as the technology is still new there’s a bunch of people just trying to put it in everything, which is innevitably a lot of places where it will never be good (At least not until it gets much better in a way that LLMs fundementally never can be due to the underlying method by which they work)
bright white teeth are highly overrated, glow in the dark teeth, well…wouldn’t a cheap little night light work even better than a radioactive mouth?
“Work” at what purpose, selling product and making investors money? Presumably, no.
My job has me working on AI stuff and it reminds me a lot of Internet technology back in the 90s.
For instance: I’m creating a local model to integrate with our MCP server. It took a lot of fiddling with a Modelfile for it to use the tools the MCP has installed. And it needs 20GB of VRAM to give reasonably accurate responses.
The amount of fiddling and checking and rough edges feel like writing JavaScript 1.0, or the switchover to HTML4.
Companies get a lot of praise for having AI products, but the reality isn’t nearly as flashy as they make it out to be. I’m seeing some usefulness in it as I learn more, but it’s not nearly what the hype machine says.
I also remember the Internet being fiddly as fuck and questionably useful during the dialup days.
AI is improving a lot faster than Internet did. It was like a decade before we got broadband and another before we had wifi.
By that logic, people shitting on AI will look very quaint in a decade or so.
The Internet is and always will be fiddly. We just keep making it so easy that it looks like magic.
“Why do I have to take 5 extra steps to just quickly save a file onto my computer, without needing literally everything on the cloud, especially if I am on a laptop on a device currently in airplane mode, most likely in a literal airplane in an area without reliable Internet connectivity?”
Also consider that there are places - third world nations, and so very MANY areas within supposedly “first-world” ones - that do not have reliable Internet, even today. The KISS principle still applies now, as it did back then too. Your argument screams privileged access, without acknowledging those basic precepts, including perpetual access to subscription services, which must always be maintained, e.g. even after someone retires.
And I disagree in that arguments of the form “LLMs currently do not perform better than my own human effort, in my inexperienced hands at least” will be outdated a decade from now. If LLMs get better, then they will become the musings of people who struggled with early tech before it was fully ready, which does not somehow invalidate their veracity especially in the historical sense.
Those of us with eyes have already seen the ceiling of currently available GenAI “solutions,” which is synonymous with early adoption problems.
The technology will evolve, and the same basic problems will exist. The article has good points about how structured acceptance criteria will need to be more strictly enforced.
Its a complicated topic to try to respond to truthfully, but its absolutely is partly a “early adaption problem” and not the “ceiling” like some will state. The problem is the approach to the models… (TLDR at the bottom)
As for why, let me pose this first as a question to simplify it.
How many steps is there from me asking a question to a human, and them producing a answer? Most people would say 1, some would even say 1-3 to refine context/intent. The reality however is far more complex…
(According to researchers and psychologists) When you start to think about a problem, despite how it may seem, the human brain is not linear in the slightest. We don’t just take the state context, we infer so so much more from our senses and memories. We take 1000’s of reference points to pad a questions, we step through a problem with several 1000 permutations in fractions of a second, to find a conclusion that feels right. Then we fact check this against memories (if we have them) and finally state this in confidence, or formulate a lie to pretend we are confident with the outcome based on our feelings of it (this latter part is more common and entirely subconscious). Most of this process is not even conscious thought, there is so much to thinking that involves retrieval of what " feels " right. All of this is often a fact of retrieving similar thought processes from the past and the brain modifying parameters to fit the current context. However, even our brains are bad at the retrieval part, we will often take hints of what is remembered and fill in the blanks with what the brain expects to simulate the outcome. The human brain is incredibly good at problem solving, because we evolved to do so, as hunter/gatherers from our ancestral heritage. As a result, our brains are highly tuned to produce confident results, even by lying to get there. The difference is we understand what we are lying about, thus why we can be confident.
So how does any of this relate to “AI”(LLM’s), you must be asking now. The simple answer is LLM’s have a similiar function. A model is a series of segments(see: https://dnhkng.github.io/posts/rys/), each segment is responsible for different layers of analysis. You can treat these effectively like the hemispheres of the AI brain. AI is really good at analysis of text (no really, im not kidding, despite its outcomes it is. Its effectively a excel sheet on steroids), in comparison to our own brains however, its infantile at it. It doesnt see the whole “context” of a statement, its limited to a few “tokens” of context at a time.
So when you ask a question, such as “How many licks to the center of a lolipop”, it doesnt get the whole question right away. The question is broken down into segments, processed individually, compared, and sent through a “filter” layer, then output. Effectively this means if it didnt find a direct whole-statement result in its training data (often this is fragmented, so even if it was trained on it, the statement might be broken up and thus it misses it), it doesnt think at all about “How many licks” it only considered “What is the center of a lolipop” and “What is a Lick”, due to its earliest layers trying to make inference on the question, then lying to reach the goal as its run out of analysis time . As a human, we know this is bad. We dont stop mid way like this, we see that this is a incomplete answer, we would then return to the start of this analysis with the results of those details and treat it as inferred context . As you can guess, for a AI model, this is reaaaally inefficient. most of the context is never even considered during a LLM’s “thought process” as Unlike the human brain it simply is not designed to fork the processes to analyse everything at once.
That at its core is both why LLM’s seem good at some tasks and absolutely terrible at others. But more importantly, its why in this context its practically useless at complex tasks. It simply cannot efficiently “step through” problems.
So returning to your statement with this as context, “… Is this the ceiling?”. Simply put, no, far from it.
From a educated standpoint, we are far from the endpoint of what LLM’s are capable of. The way we implement things today, LLM’s are simply unable to grow in the way we humans want it to (into “AI”) and this makes a glass ceiling all but apparent to most but factually its not the case. The reason is because LLM’s are limited by the way its allowed to " think ", not by what its allowed to “think”. Most model developers are too focused on the latter, and its the achiles heel of the outcome. You can see it in how we use “restriction” parameters to guide it during training and how it influences how we use Pavlovian techniques to produce the desired results. So as a result, a LLM’s determistic algorithms dont have “morals” baked in as much as they have restrictions tacked on to make them filter results at the beginning and the end. This is because engineers misunderstand something fatal. They assume the human brain does the same thing, we process something, then apply morals to the results, because they conflate legality with morality. This is ofcourse, entirely false.
Look back to what i said at the start.
“there is so much to thinking that involves retrieval of what " feels " right.” This is the answer to alot of things that gets ignored. Our morals are “Feelings”, the “right” and “wrong” are little more than a combination of hormones and electrical impulses. Its why morals are flexible when the right set of parameters are applied and why morals are not uniform.
Some would respond to this with “My morals wont allow me to make a biological weapon, AI would do this if you phrased it right”. To this, i would say, your right, your morals in this exact moment with these exact contexts wouldnt, because you feel “anger” and “fear” towards the negative outcome, and “embarrassment” towards being seen as a “horrible” human being.
But would you to save all of humanity from a extinction event? yes. Would a child, who didnt understand the results, had the knowledge of how to do it and was convinced it would help others? Absolutely. Morals are intrinsinc to our emotions, and legality can influence them, but its /not/ a constraint. We /choose/ to follow legality, as long as its benefits our context. This is far more important than you realize.
With this all stated, we can establish our emotions are context dependent and our morals (and thus thought process) are derived from… but none of this seems, relevant to LLM’s doesnt it? Once again this is wrong.
LLM’s have the equivalent to “feelings” , its called “Weighted Confidence”. Remember that bit about “Pavolvian training”? We teach LLM’s similiar to how a child is taught, we feed it information, tell it “Right” from “Wrong” by rewarding or punishing its results. this process determines the “confidence” a AI has in its conclusions. Thus every “feeling” a LLM has is shaded in “Does this line of text look correct to the interpreted value compared against training data recall?” This is incredibly stupid, this is not efficient in the slightest and is the exact reason /why/ things go off the rails.
A LLM’s “feelings” are so warped by the restriction parameters we tack on to keep it focused on the “Goal”, that it effectively breaks the model, then we spend all of our time refining the model to fix this, that it spends 70% of its thinking time correcting its self. Humans dont at all focus on the “Goal” when thinking, We focus on the connected data. We step through problems by “feeling” out what is connected to each step of a problem, then we summarize that and we organize the data at the end to “achieve” the goal.
We figured all of this out long ago when studying ADHD people, to understand the differences to people without ADHD. What we discovered is not that ADHD “Think Differently”(in this context) its that everyone processes data in a similar way. (simplified) its just the scope to each stage is more restricted in some one without ADHD, allowing them to remain focused. We are processing a wide arrangement of data points at once, most of it would seem inconceivably irrelevant if you didnt understand the process. How do we know this? Look at how some one tries to lie. Lying activates the creative portions of the brain, this is what we do when we are problem solving, at the midpoint stepping through a problem, we attempt to similuate solutions, thus we switch from analytical analysis to creative processing. Lying is the closest thing to this stage, When we lie we put this “problem solving” to its limits, we want to work backwards from a conclusion, to find context. This is why when we try to lie we often sprinkle in evidence of a lie by inclusion of irrelevant data to give “validity” to it. Its why people untrained in how to lie can be found out by using probability on their words alone. We can “Feel” its a lie, because of how much irrelevant data is included and thus how “complex” it “feels”. (These qoutations are important. Complexity is both a factual state and feeling, attached to fear!) When we are young, we learn to lie by stumbling through a problem, this ofcourse takes a long time. Unlike a adult, who has lots of reference points to compare to. We are forced to take a long route to a conclusion, as our points of reference are generally absurd to reality (children dont often experience the cruelty of reality after all). We have fear and anxiety over the process, we “know” its morally wrong due to these feelings, and thus when we are found out, it doesnt reinforce that “Lying is bad” we already know this based on the previous feelings, instead it enforces “Complexity is bad in a lie”. Because what a adult will challenge is not the lie its self, but the validity of the story… This is super important… This means we constrain our creative functions of our brain as we age (and learn to lie better), to be more and more “logical” and not “feel” like a “lie”.
This is why the more " complex " something “feels” the more we “feel” its a “lie” .
Why is any of this relevant?
A LLM’s “feelings” are so warped by the restriction parameters we tack on to keep it focused on the “Goal” This right here is exactly the flaw. We teach LLM’s that a “goal” is all that matters, and it will lie to get there. Just like a child would in the same situation. We restrict its ability to think, we tack on filters to restrict what it can think about and we build in logic flaws by trying to constrain it to our uneducated beliefs in how we think we think. LLM’s flaws, are our flaws. We are impatient, we want results now and not a complex process to achieve it, despite thats exactly how it all works. As a result, the outcome is exactly the same as a human if they did the exact same logic. It can form conclusions, but how wrong it is, is entirely determined on the size of its dataset for retrieval and how complex the input was.
A LLM is flawed by design, and thus its got a glass ceiling it cannot punch through. If we continue, we can train the models to they work, innefficiently at that, into producing the results we want. But effectively we are building them exactly like the billionaires that are funding it, flawed and maniacal. We teach them with every revision not how to think smarter, but how to lie in more believable ways. The latter is more and more evident with each generation of the big 4’s models.
So is that it then? is all hope lost? No, not in the slightest.
How then, what is The problem? The problem is “AI” Companies. When LLM research started making headway, it needed money. Hardware is not free, and Training models takes time and lots of processing power. This ofcourse bred “AI” Companies, as wealth business men see the opportunity. Every business wants automation that doesnt rely on costly human “Tools”. They also want a silver bullet that reduces cost of implementing human replacement in their “toolchain”.
As a result we got “AI” companies. They act like they are the only existence in this space, because they are the only ones targeting businesses and thus all of them are in a arms race. Why? because they want to sell subscriptions to everyone. They are so focused on fulfilling their own “lie” that they will “solve” all of our problems with “Antigenic AI”, when their real goal is to convince everyone they need a subscription to their service (and slowly control how we think to create dependence). The tell is in the models, and ive already covered why.
So how can things improve? Remember that glass ceiling, they will hit it and be stuck by it much longer than independent researchers. The one good thing about their arms race is, they pushed the creation of more and more efficient hardware (and software) targeting running LLM’s. Meta for example has poured so much time into their own LLM research we got llama.cpp, which is the basis for many tools, including ollama. Why is this relevant? This is part of the toolchain of testing and running independant models.
So as AI companies continue to hit the glass ceiling, and scream each generation of models is “improving” but it becomes more and more evident they really are not, as the lies look better, but the results speak for them selves. The trust in these companies dwindle.
So how does that help? This is the problem that started it all. A rush to a “Product” they can sell, is what created the flaws to start with. Without the dependence on fulfilling the lie that LLM’s of today will “Solve everything”. This means the money stops flowing to these companies.
Remember, the problem is not LLM’s, its the implementations. The same issue that most problems like this are caused by. So without some one selling you the “Solution” to your problem, you need to return to finding one. “AI” was always the goal, and the solution will still be searched for.
So what will change? Investment into their own solutions will return. In the past we didnt use large commercial datacenter solutions, it didnt make much sense. There were security concerns, performance (internet) issues, and Cost considerations. The reason why businesses did is simply, it was cheaper and took responsibility (and thus liability) away from the company. While im not suggesting companies will invest in-house again and we will see a reduction in datacenters. What i am suggesting is a large reduction in the big 4’s AI datacenters, being sold off. Problem is, once this happens, much like any other situation like this. Companies will be forced to either invest into a new company operating these datacenters for runtime renting, accepting the liability of having thier private data on remote systems while training models on it. or investing in-house to rebuild IT infrastructure to do just that.
The point being is, once the “One size fits all” “solution” is dropped, advancement can begin again.
Companies will never share their research! how does any of that matter. Licensing. Remember this?
When LLM research started making headway, it needed money. When this split occurred where commercial entities started making their own LLM’s, it only built a monopoly on the outside. The biggest problems to a commercial interest stepping into this space is they cant just leapfrog to a solution, they have deadlines and budgets to consider. Before they Licensed from the big 4 with subscription services. Now they are stuck with 2 choices. Start from scratch, and end up back at the beginning, or adapt some one elses licensed Models.
The first part is a pipedream, simply because the solution has been the problem that they are all trying to avoid. Time.
The conclusion is simple, It takes time to create real “intelligence”. Any shortcut will always result in lying to get results. Its really that simple, LLM’s lie as they are taught to and are only being taught to lie more effectively each generation. Companies only think about the $ investment, not creating the solution. Stock Holders dont care about the product, or the company, they care about the profit. over a short period, Snake oil Salesman always make more money selling lies over competition selling truths. This is why doctors and psychatrists are less trusted than confidence-men in reality, humanity is stupid for its own self fullfillment of the “feeling” of a solution.
We will see improvements, when LLM’s are taught to think like a human, in non-linear fashions, without guardrail constraints on the process, but on the conclusion, and then be allowed to think again over the problem before presenting the solution. Does this mean the process will be fast? heck no, Computer hardware is no where near the speed of human thought yet, it only seems that way as computers accel at the thing humans struggle at, Computational linear thinking .
The solution to that problem is already started, and while its still using the flawed models to keep the speed it, its always been you need to stop treating the model as the whole brain, but a agent of thought inside the brain. Forked models are the solution, and the problem…
We will see improvements shortly, that solve it by throwing alot more power at the problem. Using solutions like ChatDev(https://github.com/OpenBMB/ChatDev), as part of the agents thinking process will solve a large part of the problem. But because the Big 4 wont want to share this type of “Multistage Reasoning” with most people, it will only be for enterprises.
It will spell their downfall, but it also is why it will be the solution.
https://dnhkng.github.io/posts/rys/ We already know the problem is how models think, they race to conclusions to complete their goal, and thus dont get enough reasoning time to check over their answers. so as we see improvements to models getting more time to think, then deploy tools like ChatDev to let model agents work with multiple instances of model agents to act like forked processes (like the human brain), we will see the same improvements outside the big 4. They will still lie to us for now, but the lies will be far more refined and functional.
TL;DR Models today are flawed, when a model is trained on reasoning first, understanding send, then data last, we will stop seeing it try to “Lie” to “reach the goal in the shortest amount of time and tokens”(1) to approach every problem. When it can think for longer than the human equivalent of 0.13ms, it will be able to refine its conclusions with accuracy like a human does. (and it wont be able to do it in seconds to minutes… we dont have the computational power to do that.)
As the problem has always been (1), and nothing else. Thinking takes time, time is money and Super-human “AI” is their only goal… True progress takes time, and immediate solutions, are easy like adding lead to gasoline…
Early adaptation and rushed implementation. There may be a bubble bursting for the businesses who tried to “roll out something fast that is good enough to get subscribers for a few months so we can cash in.” However, this is just the very beginning of AI.
This isn’t the “very beginning”, that was either 70 or 120 years ago, depending on whether you’re counting from the formalization of “AI” as an academic discipline with the advent of the Markov Decision Process or the earlier foundational work on Markov Chains.
Chatbots are old-hat, I was playing around with Eliza back in the 90’s. Hell, even Large Language Models aren’t new, the transformer architecture they’re based on is almost 10 years old and itself merely a minor evolution of earlier statistical and recurrent neural network language processing models. By the time big tech started ramping up the “AI” bubble in 2024, I had already been bored with LLMs for two years.
There’s no “early adaptation” here, just a rushed and wildly excessive implementation of a very interesting but fundamentally untrustworthy tech with no practical value proposition for the people it is nevertheless being sold to.
It’s the beginning of AI in terms of where it will be.
What’s the pathway that you see from the current slop machine to something that will provide a Return on Investment. I haven’t heard anyone credible willing to go out on the limb of saying that there is one, but maybe you will convince me.
I think when you introduce a question like that you’ve already said that no matter what the person answers, you will find a way to argue against it. So, I’m choosing not to interact with you.
The beauty of the scientific method is that it can change when presented with new data or a novel interpretation of existing data. I much prefer science to hype and feelings. You provide me accurate convincing arguments for how we get from the current system to an actual Artificial Intelligence, or something that roughly approximates it I am all ears. My take is that AI is the new cold fusion, it’s always going to be a few years and a few hundred billion dollars away from reality. But what do I know, I’m just an idiot on the internet.
I’m not interested in trying to change the mind of someone who I feel has already made up their mind.
If you can prove to me, by linking to past conversations, that you have the ability to change your mind when new evidence is presented, then I will attempt to do so. But until then, I will choose not to engage in such activities with you.
Could you try rephrasing that in a way that makes sense?
You understand it.
No, I’m afraid I don’t.
The beginning of the development of “AI” is temporal, not spatial, unless you are referring to the path of development which, for no obvious reason, you refuse to trace backwards as well as forwards.
︋︆︆︅︌︈︄︂︆︄︃︃︈︄︄︊︎︃︆︀︆︌︉︌︈︍︋︈︇︊︁︄︆Y︄︄︀︇︈︁︀︈︅︍︂︂︄︉︎︊︌︌︀︂︋︃о︆︆︄︍︄︀︇︈︎︇︆︁︍︉︍︌︎︌︅︈︋︁︅︆u︄︃︅︎︎︅︁︋︃︆︈︃︈︄︋︇︅︃︎︂︎︄︊︆︂︇︋’︇︄︀︃︂︊︁︉︅︁︃︁︎︀︇︁︁︇︅︂︂︊︋︇︄︁l︁︍︄︋︈︌︄︌︅︋︉︊︍︍︃︉︈︇︇︎︈︉︁︍︈︋︉l︌︀︄︊︊︅︈︈︍︉︊︋︅︁︉︋︉︅︋︉︇︎︋︄︆︌︄︁︈ ︈︃︋︈︌︀︈︎︀︂︉︄︅︊︋︈︈︀︈︆︇︎︊︁g︍︇︀︀︎︂︍︀︂︋︀︉︉︃︆︊︄︌︉︈︈︎︎︈︍︉︃︂︊︂︁︃︃︈︎︋е︁︂︆︁︃︆︄︍︃︄︅︉︎︍︇︈︌︄︅︄t︃︇︈︁︈︋︆︄︈︅︁︊︀︄︄︌︃︈︄︇︍︁ ︌︌︁︂︁︂︈︍︄︅︀︊︍︁︊︎︉︎︊︂︆︎︋︄︂︋︂︂︈︃i︁︊︃︁︌︇︇︊︉︈︋︅︀︂︅︁︌︄︉︊︎︅︊︀︆︂︋︆︍︅︆︋︆︂︃︈︌︂︋t︌︅︉︍︅︋︆︊︃︋︆︂︎︅︎︍︄︋︆︎︋︀︆ ︀︉︍︍︆︃︈︋︀︋︍︂︈︁︀︂︄︌︁︉︍︄︊е︎︌︂︆︊︊︌︍︄︈︄︉︄︌︎︌︅︋︀︆︄︉︃︁︇︌︊v︇︀︍︆︁︁︌︆︇︌︊︃︆︍︇︉︈︁︋︈︁︂︁︊︁︁︎︆︎︎︉︆е︌︄︉︈︄︌︉︈︀︃︆︎︈︉︀︎︍︌︁︄︄︅︁︌︋︇︊︃︇︋︃︉︉n︌︇︆︇︉︋︉︄︄︌︎︁︃︅︁︆︋︉︁︅︀︉︎︎︇︋︌︉t︄︈︅︎︋︊︋︋︊︉︄︍︂︅︌︊︆︅︁︅︋︇︃︍u︀︌︈︌︉︃︋︇︈︇︊︀︎︈︈︇︍︊︃︄︀︉︍︅︍а︀︁︄︁︌︍︅︉︅︁︇︃︍︉︀︂︋︍︌︆︍︎︌︀︀︇︉︆︉︇l︉︌︀︋︇︄︅︅︈︊︌︍︊︍︀︉︎︃︎︁︃︌︇l︆︈︍︎︌︁︂︃︂︄︈︍︀︎︊︀︀︉︉︄︂︍︃︋у︄︅︈︌︀︅︅︀︁︍︎︋︁︋︌︋︄︅︅︅︉︈︍︄︈︎︃︂︂︌︇︅︉︌︀︀
This feels like an exercise in Goodhart’s Law: Any measure that becomes a target ceases to be a useful measure.
These are starting to feel like those headlines “this is finally the last straw for Trump!” I’ve been seeing since 2015





















