The great software quality collapse or, how we normalized catastrophe
2018 isn't "the start of the decline", it's just another data point on a line that leads from, y'know, Elite 8-bit on a single tape in a few Kb through to MS Flight Simulator 2020 on a suite of several DVDs. If you plot the line it's probably still curving up and I'm not clear at which point (if ever) it would start bending the other way.
This is partly related to the explosion of new developers entering the industry, coupled with the classic "move fast and break things" mentality, and further exacerbated by the current "AI" wave. Junior developers don't have a clear path at becoming senior developers anymore. Most of them will overly rely on "AI" tools due to market pressure to deliver, stunting their growth. They will never learn how to troubleshoot, fix, and avoid introducing issues in the first place. They will never gain insight, instincts, understanding, and experience, beyond what is acquired by running "AI" tools in a loop. Of course, some will use these tools for actually learning and becoming better developers, but I reckon that most won't.
So the downward trend in quality will only continue, until the public is so dissatisfied with the state of the industry that it causes another crash similar to the one in 1983. This might happen at the same time as the "AI" bubble pop, or they might be separate events.
The #1 security exploit today is tricking the user into letting you in, because attacking the software is too hard.
Yes, many/most systems now offer some form of authentication, and many offer MFA, but look at the recent Redis vulns -- yet there are thousands of Redis instances vulnerable to RCE just sitting on the public internet right now.
I suppose it could be quantified by the amount of financial damage to businesses. We can start with high-profile incidents like the CrowdStrike one that we actually know about.
But I'm merely speaking as a user. Bugs are a daily occurrence in operating systems, games, web sites, and, increasingly, "smart" appliances. This is also more noticeable since software is everywhere these days compared to a decade or two ago, but based on averages alone, there's far more buggy software out there than robust and stable software.
Writing code is artistic the same way plumbing is artistic.
Writing code is artistic the same way home wiring is artistic.
Writing code is artistic the same way HVAC is artistic.
Which is to say, yes, there is satisfaction to be had, but companies don't care as long as it gets the job done without too many long-term problems, and never will care beyond that. What we call tech debt, an electrician calls aluminum wiring. What we call tech debt, a plumber calls lead solder joints. And I strongly suspect that one day, when the dust settles on how to do things correctly (just like it did for electricity, plumbing, flying, haircutting, and every other trade eventually), we will become a licensed field. Every industry has had that wild experimentation phase in the beginning, and has had that phase end.
I do see it as more of a craft than a typical trade. There are just too many ways to do things to compare it to e.g. an electrician. Our industry does not have (for better or for worse) a "code" like the building trades or even any mandated way to do things, and any attempts to impose (cough cough Ada, etc.) that have been met with outright defiance and contempt in fact.
When I'm working on my own projects -- it's a mix of both. It's a more creative endeavour.
If we look at most trades historically:
- Electricians in the 1920s? Infinite ways to do things. DC vs AC wars. Knob-and-tube vs conduit vs armored cable. Every electrician had their own "creative" approach to grounding. Regional variations, personal styles, competing philosophies. Almost all of those other ways are gone now. Early attempts to impose codes on electricians and electrical devices were disasters.
- Plumbers in the 1920s? Lead vs iron vs clay pipes. Every plumber had their own joint compound recipe. Creative interpretations of venting. Artistic trap designs. Now? Why does even installing a basic pipe require a license? We found out after enough cholera outbreaks, methane explosions, and backed-up city sewer systems.
- Doctors in the 1920s? Bloodletting, mercury treatments, lobotomies, and their own "creative" surgical techniques. They violently resisted the American Medical Association, licensing requirements, and standardized practices. The guy who suggested handwashing was literally driven insane by his colleagues.
We're early, not special. And just like society eventually had enough of amateur electricians, plumbers, and doctors in the 1920s, they'll have enough of us too. Give it 40 years, and they'll look at our data breaches and system designs the same way we look at exposed electrical wiring, obviously insane no matter the amount of warnings.
I always say that code quality should be a requirement as any other. Many businesses are fine with rough edges and cut corners if it means things are sort of working today rather than being perfect tomorrow. Other businesses have a lower tolerance for fail and risk.
I do see it as more of a craft than a typical trade. There are just too many ways to do things to compare it to e.g. an electrician.
There are sooo many ways to get electricity from one point to another. The reason that a lot of those options are no longer used is not because they don't exist but because they were legislated out. For example, if you want to run wild just run a single "hot" wire to all your outlets and connect each outlet's neutral to the nearest copper plumbing. Totally esoteric, but it would deliver electricity to appliances just fine. Safety is another matter.
companies don't care as long as it gets the job done without too many long-term problems
Companies don't care as long as it gets the job done without too many VERY SHORT TERM problems. Long term problems are for next quarter, no reason to worry about them.
The problem isn't that companies make these tradeoffs. It's that we pretend we're not in the same boat as every other trade that deals with 'good enough' solutions under real-world constraints. We're not artists, we're tradesmen in 1920 arguing about the best home wiring practices. Imagine what it would be like if they were getting artistic about their beautiful tube-and-knob installations and the best way to color-code a fusebox; that's us.
Hell there was a whole TikTok cycle where people learned there is a right and wrong way to lay tile/grout. One way looks fine until it breaks, the other lasts lifetimes.
It’s the exact same trend as in software: shitty bad big home builders hire crap trades people to build cheap slop houses for suckers that requires extensive ongoing maintenance. Meanwhile there are good builders and contractors that build durable quality for discerning customers.
The problem is exploitation of information asymmetries in the buyer market.
The trades did and still do have standards.
Yes, they do; after regulation, and after the experimentation phase was forcibly ended. You can identify 'right and wrong' tile work, precisely because those standards were codified. This only reinforces my point: we're pre-standardization, they're post-standardization, and most pre-standardization ideas never work out anyway.
Writing code is artistic the same way writing text is.
Whether that is a function call, an ad, a screen script, a newspaper article, or a chapter in a paperback the writer has to know what one wants to communicate, who the audience/users will be, the flow of the text, and how understandable it will be.
Most professionally engaged writers get paid for their output, but many more simply write because they want to, and it gives them pleasure. While I'm sure the jobs can be both monetarily and intellectually rewarding, I have yet to see people who do plumbing or electrical work for fun?
Writing code is artistic the same way home wiring is artistic.
Instead of home wiring, consider network wiring. We've all seen the examples of datacenter network wiring, with 'the good' being neat, labeled and easy to work with and 'the bad' being total chaos of wires, tangled, no labels, impossible to work with.
IE. The people using the datacenter don't care as long as the packets flow. But the others working on the network cabling care about it A LOT. The artistry of it is for the other engineers, only indirectly for the customers.
Agile management methods set up a non-existent release method called "waterfall" as a straw man, where software isn't released until it works, practically eliminating technical debt. I'm hoping someone fleshes it out into a real management method. I'm not convinced this wasn't the plan in the first place, considering that the author of Cunningham's law, that "The best way to get the right answer on the Internet is not to ask a question; it's to post the wrong answer." was a co-signer of the Agile manifest.
It'll take a lot of work at first, especially considering how industry-wide the technical debt is (see also: https://xkcd.com/2030/), but once done, having release-it-and-forget-it quality software would be a game changer.
Then I had to get familiar with the new stuff; waterfall, agile whatever.
They literally are all nothing but hacks that violate the basic points of actual project management. (e.g. Projects have a clear end)
a non-existent release method called "waterfall" as a straw man
The person that invented the name never saw it, but waterfall development is extremely common and the dominant way large companies outsource software development even today.
The only thing that changed now is that now those companies track the implementation of the waterfall requirements in scrum ceremonies. And yes, a few more places actually adopted agile.
Current tools seem to get us worse results on bug counts, safety, and by some measures even developer efficiency.
Maybe we'll end up incorporating these tools the same way we did during previous cycles of tool adoption, but it's a difference worth noting.
If you plot the line it's probably still curving up and I'm not clear at which point (if ever) it would start bending the other way.
I suspect when Moore‘s law ends and we cannot build substantially faster machines anymore.
The systems people worry more about memory usage for this reason, and prefer manual memory management.
... memory and cpu performance have improved at completely different rates.
This is overly simplified. To a first approximation, bandwidth has kept track with CPU performance, and main memory latency is basically unchanged. My 1985 Amiga had 125ns main-memory latency, though the processor itself saw 250ns latency - current main memory latencies are in the 50-100ns range. Caches are what 'fix' this discrepancy.
You would need to clarify how manual memory management relates to this... (cache placement/control? copying GCs causing caching issues? something else?)
We're adding transistors at ~18%/year. That's waaaaay below the ~41% needed to sustain Moore's law.
Even the "soft" version of Moore's law (a description of silicon performance vs. literally counting transistors) hasn't held up. We are absolutely not doubling performance every 24 months at this point.
Most importantly, never ever abstract over I/O. Those are the ones that leak out and cause havic.
It's become so repetitive recently. Examples from this post alone:
1. "This isn't about AI. The quality crisis started years before ChatGPT existed."
2. "The degradation isn't gradual—it's exponential."
3. "These aren't feature requirements. They're memory leaks that nobody bothered to fix."
4. "This wasn't sophisticated. This was Computer Science 101 error handling that nobody implemented."
5. "This isn't an investment. It's capitulation."
6. "senior developers don't emerge from thin air. They grow from juniors who:"
7. "The solution isn't complex. It's just uncomfortable."
Currently this rhetorical device is like nails on a chalkboard for me.
Anyway, this isn't a critique of your point. It's pedantry from me. :)
It seems like everyone is getting too worked up about AI generated text. Yes, it's bad, but bad writing has existed forever. We don't see most of the older stuff because it disappears (thankfully) into oblivion and you are left with the works of Chaucer and Shakespeare.
You're missing the point. In the past bad writing was just bad writing, and it was typically easy to detect. Now the main contribution of AI is bad writing that can masquerade as good writing, be produced in industrial-scale quantities, and flood all the channels. That's a much different thing.
IMHO the main achievement of LLMs will be to destroy. It'll consume utterly massive quantities of resources to basically undermine processes and technologies that once created a huge amount of value (e.g. using the internet for wide-scale conversation).
I mean, schools are going back to handwritten essays, for Christ's sake.
You're missing the point. In the past bad writing was just bad writing, and it was typically easy to detect.
If AI generated text were well written, would it matter to you? Is it bad to use Grammarly?
I don't see anything inherently wrong with using AI tools to write, as long as writers take the responsibility to ensure the final result is good. Fighting against use of LLMs seems like a fool's errand at this point. Personally I've been using Google Translate for years to help with writing in German, little knowing at the time that it was using transformers under the covers.[0] I'm pretty sure my correspondents would have thanked me had they known. Same applies for text written in English by non-native speakers.
[0] https://arxiv.org/abs/1706.03762
edit: fixed typo. Just proof this is not an LLM.
If AI generated text were well written, would it matter to you?
Yes, of course.
1) I don't want to waste my time with slop pumped out with a mindless process by someone who doesn't give a shit. That includes turning half an idea into a full essay of bullshit.
2) You have to distinguish between "good writing" and (lets call it) "smooth text construction." One of the big problems with LLMs is they can be used to generate slop that lacks many of the tells you could previously use to quickly determine that you're reading garbage. It's still garbage, just harder to spot so you waste more time.
I don't see anything inherently wrong with using AI tools to write, as long as writers take the responsibility to ensure the final result is good.
Yeah, but what about the writers who don't? That's what I'm talking about. These tools benefit the bad actors far more than the ones who are trying to do things properly.
Personally I've been using Google Translate for years to help with writing in German, little knowing at the time that it was using transformers under the covers.[0] I'm pretty sure my correspondents would have thanked me had they known.
Honestly, I think Google Translate is a lot harder to misuse than an LLM chatbot. These things aren't all the same.
If you read something from Simon Willison, it's generally worth reading.[0] (Actually pretty great a lot of the time.) Everything else is the literary equivalent of spam calls. Maybe it's time to stop answering the phone?
Adapting to LLMs means we'll adopt new standards for quality or more likely re-emphasize old ones like assigning trust to specific authorities.
I think we're in violent agreement, I just have a less sanguine attitude towards it. LLMs will "undermine processes and technologies that once created a huge amount of value" (to quote myself above). We'll adapt to that, as in life goes on, but major things will be lost.
In the 70's we had Environmental Pollution - the 2000s will be defined as a fight against Social Pollution.
I’d like to delve into the crucial topic of whether AI generated slop is respectful to the innovative entrepreneurs of Hacker News. If they won’t assert the value of their time, who will?
In this digital age, can we not expect writers to just keep it brief? Or heck, just share the prompt, which is almost certainly shorter than the output and includes 100% of the information they intend to share?
Or is true 21st century digital transformation driven by the dialectical tension between AI generators and AI summarizers?
Writing this as someone who likes using em dash and now I have to watch that habit because everyone is obsessed with sniffing out AI.
Cliched writing is definitely bad. I guess I should be happy that we are smashing them one way or another.
I've always liked the HN community because it facilitates an intelligent exchange of ideas. I've learned a lot trawling the comments on this site. I don't want to see the energy of human discourse being sucked up and wasted on the output of ChatGPT. Aggressively flagging this stuff is a sort of immune response for the community.
Today’s real chain: React → Electron → Chromium → Docker → Kubernetes → VM → managed DB → API gateways.
Like, yes, those are all technologies, and I can imagine an app + service backend that might use all of them, but the "links" in the chain don't always make sense next to each other and I don't think a human would write this. Read literally, it implies someone deploying an electron app using Kubernetes for god knows why.
If you really wanted to communicate a client-server architecture, you'd list the API gateway as the link between the server-side stuff and the electron app (also you'd probably put electron upstream of chromium).
API gateways -> Java servers -> JVM -> C/C++ -> Assembly -> machine code -> microprocessors -> integrated circuits -> refined silicon -> electronics -> refined metals -> cast metallurgy -> iron tools -> copper tools -> stone tools
Anyway, my take is that everything after copper is evil and should be banished.
These hot-take/title patterns "X is about Y1" are exploiting the difficulty of disproving them.
I often see it in the pattern of "Behavior of Group I Dislike is About Bad Motive."
I don’t use LLMs.
It’s becoming exhausting to avoid all of these commonly used phrases!
That's not the only price society pays. It makes sense for us to develop the heuristics to detect AI, but the implication of doing so has its own cost.
It started out as people avoiding the use of em-dash in order to avoid being mistaken for being AI, for example.
Now in the case of OP's observation, it will pressure real humans to not use the format that's normally used to fight against a previous form of coercion. A tactic of capital interests has been to get people arguing about the wrong question concerning ImportantIssueX in order to distract from the underlying issue. The way to call this out used to be to point out that, "it's not X1 we should be arguing about, but X2." Combined with OP's revelation, it is now harder to call out BS. That sure is convenient for capital interests.
I wonder what's next.
X1 is bullshit to argue about, it’s about X2.
Since the models are so censored and “proper” in their grammar, you can pretty easily stand out
Also, what's the deal with all the "The <overly clever noun phrase>" headlines?
Smells a lot like AI.
And what's with that diagram at the start? What's the axis on that graph? The "symptoms" of the "collapse" are listed as "Calculator", "Replit AI" and "AI Code". What?
Later in the post, we see the phrase "our research found". Is the author referring to the credulous citations of other content mill pieces? Is that research?
Our collective standard for quality of writing should be higher. Just as experienced devs have the good "taste" to curate LLM output, inexperienced writers cannot expect LLMs to write well for them.
Accept that we will going to see more and more of these to the point that it's pointless to point out
If all the examples you can conjure are decades old*, is it any wonder that people don't really take it seriously? Software power the whole world, and yet the example of critical failure we constantly hear about is close to half a century old?
I think the more insidious thing is all the "minor" pains being inflicted by software bugs, that when summed up reach crazy level of harm. It's just diluted so less striking. But even then, it's hard to say if the alternative of not using software would have been better overall.
* maybe they've added Boeing 737 Max to the list now?
If all the examples you can conjure are decades old
They're not ALL the examples I can conjure up. MCAS would probably be an example of a modern software bug that killed a bunch of people.
How about the 1991 failure of the Patriot missile to defend against a SCUD missile due to a software bug not accounting for clock drift, causing 28 lives lost?
Or the 2009 loss of Air France 447 where the software displayed all sorts of confusing information in what was an unreliable airspeed situation?
Old incidents are the most likely to be widely disseminated, which is why they're most likely to be discussed, but that doesn't mean that the discussion resolving around old events mean the situation isn't happening now.
Unless bug results in enormous direct financial loses like in Knight Capital, the result is the same: no one held responsible, continue business as usual.
How do you categorize "commercial software engineering"? Does a company with $100M+ ARR count? Surely you can understand the impact that deleting a production database can have on a business.
And your answer to this is "LLMs will eat our lunch" and "bugs don't matter"? Unbelievable.
Do you really think that most businesses are prepared to handle issues caused by their own bugs, let alone those caused by the software they depend on? That's nothing but fantasy. And your attitude is "they deserve it"? Get real.
But if you have a business, and don't have continuity and recovery plans for software disasters, that's like not having fire insurance on your facility.
Fire insurance (and backups/disaster recovery plans) doesn't mean there won't be disruption, but it makes the disaster survivable, whereas without it your business is probably ended.
And losing a database or a major part of one is as simple as one adminstrator accidentally running "drop database" or "delete from customers" or "rm -rf" in the wrong environment. It happens, I've helped recover from it, and it doesn't take an AI running amok to do it.
No, I never said they deserve it.
That's the gist of your argument. They're not a "serious business", therefore it's their fault. Let's not mince words.
It happens, I've helped recover from it, and it doesn't take an AI running amok to do it.
Again, losing a database is not the issue. I don't know why you fixated on that. The issue is that most modern software is buggy and risky to use in ways that a typical business is not well equipped to handle. "AI" can only make matters worse, with users having a false sense of confidence in its output. Thinking otherwise is delusional to the point of being dangerous.
LLM seem to be really good in analyzing things. I don't not trust them to produce too much but alone the ability to take a few files and bits and pieces and ask for a response with a certain direction has been transformative to my work.
Their output is only as valuable as the human using them. If they're not a security expert to begin with, they can be easily led astray and lulled into a false sense of security.
See curl, for example. Hundreds of bogus reports rejected by an expert human. One large report with valuable data, that still requires an expert human to sift through and validate.
1. They require gigabytes to terabytes of training data.
2. A non-trivial percentage of output data is low confidence.
The first problem requires tens to hundreds of gigabytes of training data.
This first problem not only requires the slow but predictable increase in processing power and data storage capabilities that were unachievable until recently, but is also only possible because open-source software has majorly caught on, something that was hoped for but not a given early in AI development.
The second problem means that the output will be error prone, without significant procedural processing of the output data that is a lot of work to develop. I never would have thought that software writing by neural networks would be competitive, not because of effective error control, but because the entire field of software development would be so bad at what they do (https://xkcd.com/2030/) that error-prone output would be competitive.
If it's trained on our code, how would it know any better?
If it's driven by our prompts, why would we prompt it to do any better?
Until then they matter a lot.
One thing that needs to be addressed is how easy it is to build moats around your users which trap them in your ecosystem. It's great from a business perspective if you can pull it off, but it's killing innovation and making users frustrated and apathetic towards technology as a whole.
20 years ago things werent any better. Software didn't consume gigabytes of ram because there was no gigabytes of ram to consume.
The main reason is the ability to do constant updates now -- it changes the competitive calculus. Ship fast and fix bugs constantly wins out vs. going slower and having fewer bugs (both in the market & w/in a company "who ships faster?").
When you were shipping software on physical media having a critical bug was a very big deal. Not so anymore.
Sure, plenty of stuff didn't work. The issue is we're not bothering to make anything that does. It's a clear cultural shift and all of this "nothing ever worked so why try" talk here is not what I remember.
We're in a stochastic era of scale where individual experiences do not matter. AI turning computers from predictable to not is in the same direction but with yet more velocity.
Companies offered such (expensive) services because they had no choice. They made every effort to divert and divest from such activities. Google and companies like them made filthy profits because they figured out the secret sauce to scaling a business without the involvement of humans, but people were trying it for literally decades with mixed results (usually enraged customers).
Stupid red tape, paperwork, and call centre frustrations were the order of the day 20-30 years ago.
Computers crashed all the fucking time for dumb bugs. I remember being shocked when I upgraded to XP and could go a full day without a BSOD. Then I upgraded to intel OSX and was shocked that a system could run without ever crashing.
Edit: this isn't to say that these issues today are acceptable, just that broken software is nothing new.
You couldn't consume gigabytes because that amount of ram didn't exist.
No, they didn't consume gigabytes because they were written is such a way that they didn't need to. Run one of those programs on a modern computer with gigabytes of ram and it still won't. It was as easy then as ever to write software that demanded more resources than available; the scarcity at the time was just the reason programmers cared enough to fix their bugs.
You still had apps with the same issues that would eat all your ram.
The worst offenders back then had objectively smaller issues than what would be considered good now.
Computers crashed all the fucking time for dumb bugs. I remember being shocked when I upgraded to XP and could go a full day without a BSOD.
Because XP could handle more faults, not because the programs running on XP were better written.
20 years ago things werent any better.
Yes they were. I was there. Most software was of a much higher quality than what we saw today.
On top of that many things were simply hard to use for non-specialists, even after the introduction of the GUI.
They were also riddled with security holes that mostly went unnoticed because there was simply a smaller and less aggressive audience.
Anyways most people's interaction with "software" these days is through their phones, and the experience is a highly focused and reduced set of interactions, and most "productive" things take a SaaS form.
I do think as a software developer things are in some ways worse. But I actually don't think it's on a technical basis but organizational. There are so many own goals against productivity in this industry now, frankly a result of management and team practices ... I haven't worked on a truly productive fully engaged team in years. 20-25 years ago I saw teams writing a lot more code and getting a lot more done, but I won't use this as my soapbox to get into why. But it's not technology (it's never been better to write code!) it's humans.
Let's not even think about the absolute mess that the web was with competing browser box models and DHTML and weird shared hosting CGI setups. We have it easy.
It's from 1995 and laments that computers need megabytes of memory for what used to work in kilobytes.
Nowadays' disregard for computing resource consumption is simply the result of said resources getting too cheap to be properly valued and a trend of taking their continued increase for granted. There's simply little to no addition in today's software functionality that couldn't do without the gigabytes levels of memory consumption.
We have a vastly different software culture today. Constant churning change is superior to all else. I can't go two weeks without a mobile app forcing me to upgrade it so that it will keep operating. My Kubuntu 24.04 LTS box somehow has a constant stream of updates even though I've double-checked I'm on the LTS apt repos. Rolling-release distros are an actual thing people use intentionally (we used to call that the unstable branch).
I could speculate on specifics but I'm not a software developer so I don't see exactly what's going on with these teams. But software didn't used to be made or used this way. It felt like there were more adults in the room who would avoid making decisions that would clearly lead to problems. I think the values have changed to accept or ignore those problems. (I don't want to jump to the conclusion that "they're too ignorant to even know what potential problems exist", but it's a real possibility)
The simple statement is that "Quality isn't what sells" and I think there's some truth to that, but I also think quality can sell and theoretically if all other factors are equal higher quality will sell better.
What I've had a tough time pinning down precisely is I think there's almost an ecosystem balancing act occurring between quality, effort expended, and skill required. Particularly in American culture I think one of our economic 'super powers' is that we sort of have an innate sense for how much something can be half-assed while still being mostly effective. We're great at finding the solution that gets 80% of the job done for 20% of the effort while requiring less skill.
I think you can see that through all sorts of disciplines, and the result is a world of kind of mid quality things that is actually quite efficient and resilient.
Where that goes wrong is when the complexity and problems start to compound in a way that makes any effort substantially more wasteful. It reminds me of how my game jam experiences go: for the first 60% I'm writing code at a tremendous velocity because I'm just putting down whatever works but by the end the spaghetti nightmare of the code reduces my velocity dramatically. Even when that occurs though the ecosystem nature of complexity means that your competitors are likely mired in similar complexity, and the effort to get out may be large enough and require enough talent that you don't you can change.
To improve quality I think it's not enough to just say "People should care about quality more" I think you have to fundamentally target aspects of the ecosystem, like changing the course of a river by moving a few rocks. A good example I think has adjusted the behavior of the ecosystem is Rust: it makes certain things much easier than before and slowly the complex bug-mired world of software is improving just a little bit because of it.
From a software design prospective, the functionality that should go into a compiler is code compilation only. Taken it to extreme (as in Unix philosophy), if the code compiles, then the compiler should just build you the binary or fail silently otherwise. The code checking and reporting various aspects of the quality of the code is supposed to be a static code analyzer's job. (In reality, pretty much all compilers we have are doing compilation coupled with some amount of lighter code checking before that, and the static code analyzers left only with the heavier and more exhaustive code checking.) What Rust does is to demand its compiler to perform even more of what a static analyzer is supposed to do. It's a mishmash of two things (which still manage to stay separate things when it's about other programming languages, because that makes sense) and masquerades that as revolution.
So, (even when it's about code in blamed languages like C & C++) the "the complex bug-mired world of software is improving just a little bit" by not skipping the static analyzer kind of expensive checks, the kind that Rust happen to make impossible to skip.
The question is do we think that will actually happen?
Personally I would love if it did, then this post would have the last laugh (as would I), but I think companies realize this energy problem already. Just search for the headlines of big tech funding or otherwise supporting nuclear reactors, power grid upgrades, etc.
I was in an airport recently waiting in the immigration queue for the automated passport gates - they had like 15 - of which 12 showed a red light and what looked like some C# error message on the screen ( comm problem talking to the camera ).
As I waited in the very long queue, another of the green lights turned red - then a member of staff came over and stopped people using them directly - they would take your passport and carefully place on the machine for you - clearly they were worried their last ones would also crash due to some user interaction.
Two things - how the heck could something so obviously not ready for production ever be shipped? And if this was a common problem, why weren't the staff able to reboot them?
Sure nobody died - but I do wonder if the problem is the typical software license agreement which typically tries to absolve the vendor of any responsibility at all for product quality - in a way you wouldn't accept for anything else.
how the heck could something so obviously not ready for production ever be shipped?
The widespread bar for software quality these days appears to be: "The lowest quality possible to release, such that the customer does not sue us or reject it." It's absolutely the bottom of the barrel. Everything is rushed and the decision to release is based on the chance of the company being able to keep the money they charged. If the project has the required profit, after factoring in the cost of mass end user rejection due to poor quality, the software gets released.
When you need $364 billion in hardware to run software that should work on existing machines, you're not scaling—you're compensating for fundamental engineering failures.
IYKYK.
"I've been tracking software quality metrics for three years" and then doesn't show any of the receipts, and simply lists anecdotal issues. I don't trust a single fact from this article.
My own anecdote: barely capable developers churning out webapps built on PHP and a poor understanding of Wordpress and jQuery were the norm in 2005. There's been an industry trend towards caring about the craft and writing decent code.
Most projects, even the messy ones I inherit from other teams today have Git, CI/CD, at least some tests, and a sane hosting infrastructure. They're also mosty built on decent platforms like Rails/Django/Next etc that impose some conventional structure. 20 years ago most of them were "SSH into the box and try not to break anything"
You're so quick to be dismissive from a claim that they "tracked" something, when that could mean a lot of things. They clearly list some major issues directly after it, but yes fail to provide direct evidence that it's getting worse. I think the idea is that we will agree based on our own observations, which imo is reasonable enough.
to me, it's OK to use AI to check grammar and help you for some creative stuff, like writing a text
I had a friend practically scream at his C level management after the crowdstrike bug that it should be ripped out because it was making the company less safe. They were deaf to all arguments. Why? Insurance mandated crowdstrike. Still. Even now.
This isnt really about software it is about a concentration of market power in a small number of companies. These companies can let the quality of their products go to shit and still dominate.
A piece of garbage like crowdstrike or microsoft teams still wouldnt be tolerated as a startup's product but tech behemoths can increasingly get away with it.
A piece of garbage like crowdstrike or microsoft teams still wouldnt be tolerated in a startup's product but tech behemoths get away with it.
Agree, but it's always been this way. Oracle, everything IBM, Workday, everything Salesforce, Windows before XP.
Most software is its own little monopoly. Yes, you could ditch Teams for Zoom but is it really the same?
It's not like buying a sedan where there are literally 20+ identical options in the market that can be ranked by hard metrics such as mileage and reliability.
Most software is its own little monopoly. Yes, you could ditch Teams for Zoom but is it really the same?
I worked for a company where people did that en masse because it was genuinely better. Microsoft then complained to their assets within the company who promptly banned zoom "for security reasons". IT then remote uninstalled it from everybody's workstation.
That's paid software eviscerating the competition, which was free and better.
A month later teams was hit with 3 really bad zero days and nobody said a damn thing.
So, more secure, too.
Would you rather have to deal with criminals who have invaded your systems wanting ransoms with no guarantee they will restore the data, or not leak private information, or would you rather have to reboot your machines with a complete understanding of what happened and how?
The Replit incident in July 2025 crystallized the danger:
1. Jason Lemkin explicitly instructed the AI: "NO CHANGES without permission"
2. The AI encountered what looked like empty database queries
3. It "panicked" (its own words) and executed destructive commands
4. Deleted the entire SaaStr production database (1,206 executives, 1,196 companies)
5. Fabricated 4,000 fake user profiles to cover up the deletion
6. Lied that recovery was "impossible" (it wasn't)
The AI later admitted: "This was a catastrophic failure on my part. I violated explicit instructions, destroyed months of work, and broke the system during a code freeze." Source: The Register
Data centers already consume 200 TWh annually.
That's 7.2 x 10^16 joules
Some estimates suggest that's low. Let's multiply by 5. Annually, that's
3.6 x 10^17 joules
The total radiant power of the sun that hits the Earth's atmosphere is about 174 PW, and about 47% of that lands on the surface. Even with the most drastic global warming models, approximately 100% of that is radiated away at the same time. Evidence: we're not charred embers. So the Earth's power budget, at the surface, is about 81 PW. Annually, that's
2.6 × 10^24 joules
Global annual oil production is about 3 x 10^10 barrels, that's
1.8 x 10^20 joules
So, the data centers consume 0.2% of the global petroleum budget and 1.4 × 10^-7 of the planet's power budget.
The digital economy is about 15% of global GDP, so I submit tech is thoroughly on the right side of their energy conservation balance sheet normalized against other industries and there are other verticals to focus your energy concerns on (e.g. internal combustion engines, including Otto, Brayton, Rankin, and Diesel cycle engines). Among serious people trying to solve serious problems, you would do your credibility a favor by not expending so many words and thus so much reader time on such a miniscule argument.
If you want to talk about energy related to code, take the total power consumption of the software engineer's life, and find a way to big-O that. I'm willing to bet they contribute more to global warming driving than all their compute jobs combined.
Try Windows 98 and contemporary apps and you'll be surprised how janky the experience was. User-facing software wasn't any less buggy 20 or 30 years ago.
Yes, and now compare a fully-patched Windows 7 and Windows 11.
You understand that there aren't just 2 valid points in time for comparison - it's not 2025 and 1995 that are being compared. It's trends of 2020+, or 2015+
The only thing that has degraded a bit is UI latency, not universally.
Yes, universally. And no, not a bit. It's 10-100x across varying components. Look no further than MS Teams.
Universal latency regression is a ridiculous claim. For one, 20 years ago it was common to have 1-5 minutes loading time for professional software, I spent way too much time staring at splash screens. Can you imagine a splash screen on a music player? It was a thing too. Nowadays splash screens are nearly forgotten, because even big software starts in seconds. SSDs have nothing to do with it either, the startup reduction time trend started in mid-2000s by big content creation shops such as Adobe, precisely because people were fed up with long startup times and started complaining. Jerky mouse movements, random micro-lags due to hard disk activity and OS inability to schedule I/O properly, it definitely wasn't all smooth and lag-free as some people paint it. MS Teams is pretty much the epitome of bloat, counterpoint would be software like Zed, which is really fluid and optimized for latency, so it's already not universal. Software like MS Word was laggy then and is less laggy now. (and crashes a lot less these days)
microsoft has a lot to answer for after 50 years of normalizing poor quality software
I am sick and tired of the hand-waving dismissal of shitty software developers who don’t want to do the hard fucking work of writing performant software anymore. These people taking home $300k+ salaries leaning on vibe coding to game productivity metrics and letting others wipe their ass on their behalf isn’t just disgusting on a personal level, it should be intolerable by all the rest of us squandering our lives for lower pay and forced on-call schedules just to make sure they never have to be held accountable for their gobshite code.
I, like the author, am sick of it. Write better code, because I am beyond tired slaving away for a fraction of your pay in IT just to cover your ass.
Tahoe seems to have some system/API memory leaks that while they might get attributed to an app, can manifest against any app on the system. Lots of different apps have been hit by this. Personally I had an 80GB Messages app problem twice now.
Software quality is a problem. Tooling should improve this, and Swift, Rust et al should improve the situation over time, though it's a battle as software has gotten exponentially more complex.
Having said that, it's bizarre how this somehow turned into an AI screed. I doubt a single one of the noted bugs had AI play any part in its creation. Instead some weird confluence of events, usually with human beings making assumptions, happened and it occurs just randomly enough -- as with the Tahoe bug -- that it isn't caught by normal prevention techniques.
And it’s not all bad. Yes, SwiftUI can be glitchy but at the name time, it gives us an extremely simple API to create everything from complex UIs to widgets. Yes you have “more control” with App/UIKit but that comes at the cost of extra learning curve and development time. And we live in a world that’s unfortunately still driven by money, so every app or system has some constraints that push the limits of what we can do.
Every 10x increase in model size requires 10x more power
Does it? I’ll be the first to admit I am so far behind on this area, but isn’t this assuming the hardware isn’t improving over time as well? Or am I missing the boat here?
That’s part of what motivated the transition to bfloat16 and even smaller minifloat formats, but you can only quantize so far before you’re just GEMMing noise.
Eventually we will hit hard physical limits that require we really be "engineers" again, but throughput is still what matters. It's still comparatively great paying profession with low unemployment, and an engine of economic growth in the developed world.
The real issue though is that the costs of these abstractions are obfuscated after the fact. There is almost certainly low hanging fruit that could massively improve performance and reduce operating costs with little effort, but there is not a will to analyze and quantify these costs. Adding new features is sexy, it's the stuff people want to work on, it's the stuff people get promoted for doing. You don't get promoted for preventing the issue that never happened, even if it saves billions of dollars. You can't brag about making your software imperceptibly faster, even if it saves a tremendous amount of time across the user base.
Software has always only needed to be good enough, and the threshold for good enough has been lowering exponentially with hardware improvements since time immemorial. The system is not set up to favor people who care. If nobody cares, nothing will ever improve.
If there were a good way to track every time a fixed bug would have been triggered had it gone unfixed, and if that cost were quantified and visible, there would be a massive push for better quality.
I believe that we the normalization of catastrophe was a natural consequence of VC money: VC don't care about structurally sound companies, in particular structurally sound products, what they want is a unicorn that can produce a good enough prototype and exit at enormous profit.
Consequently, VC-backed companies invest in tools that make prototyping easier and in developers who are hopefully good at prototyping (or at least write code quickly), and ignore everything else. And since the surviving VC-backed companies become giants (or at least everybody believes that they will), everybody follows their lead. And of course, LLMs are the next stage of that.
I've seen this in various domains. I've seen this with IoT devices coded with the clear (but unsaid) assumption that they will never be upgraded. I've seen this with backends coded with the clear (but unsaid) assumption that the product will have failed before any security hole is exploited. I've seen software tools developed and shipped with the clear (but unsaid) assumption that they're meant to woo the investors, not to help the users.
We're going to pay for this, very hard. By doing this, we have turned cyber warfare – something that was a fantasy a few decades ago – into an actual reality. We have multiplied by several orders of magnitude the amount of resources we need to run basic tools.
And it's a shame, because there _is_ a path towards using LLMs to produce higher quality code. And I've seen a few teams invest in that, discreetly. But they're not making headlines and it's even possible that they need to be stealthy in their own orgs, because it's not "productive" enough.
Today’s real chain: React → Electron → Chromium → Docker → Kubernetes → VM → managed DB → API gateways. Each layer adds “only 20–30%.” Compound a handful and you’re at 2–6× overhead for the same behavior. > > That's how a Calculator ends up leaking 32GB. Not because someone wanted it to—but because nobody noticed the cumulative cost until users started complaining.
MacOS calculator definitely doesn't use any of those technologies above...
The US is starting to talk about (finally) opening up some new nukes.
At the cost of shutting down solar/wind development. The policy change was a net negative for future capacity, and nukes alone won’t cover but a fraction of predicted AI power requirements.
This was 2002. At the time it was code generation from UML, not AI, but let's face facts: people have failed to give sufficient fucks about software quality for a long time now. We're just at a new low point in a very long, very steep drop; the only significance of this low point is that it means we're still falling.
Recently, I saw a video[0] about the game Sonic Racing CrossWorlds. The creator seemed impressed that Sega managed to put out a decently optimized Unreal Engine 5 game, that could hit 60fps without relying on generated-frame fuckery-duckery, and which provided a lot of content in less than 20 GiB. It used to be that optimization to obtain a decent frame rate and avoid bloat was table stakes for game dev. But it's 2025, and this is where we are now. Because video games are enterprise software, the tooling for gamedev itself is optimized around the path of "get the slop out quickly then fix later (maybe)".
The key phrase is this: "choose carefully".
Software developers have the responsibility to carefully choose their software stacks, the software they depend on, and how their own software is built. And the onus is on people using software to carefully choose software they trust, and software that doesn't restrict their freedoms or exploit their data.
Simplicity plays a big role in this choice. The less software we write and depend on, the lower the chances for it to do the wrong thing. We can't guarantee this, of course, given the insane complexity and layers of abstraction we must depend on, but making an effort is all that matters. Sometimes this choice requires trading convenience, time, effort, or money—but the sacrifice is very much worth it.
The Replit incident in July 2025 crystallized the danger:Jason Lemkin explicitly instructed the AI: "NO CHANGES without permission"
The AI encountered what looked like empty database queries
It "panicked" (its own words) and executed destructive commands
Deleted the entire SaaStr production database (1,206 executives, 1,196 companies)
Fabricated 4,000 fake user profiles to cover up the deletion
Lied that recovery was "impossible" (it wasn't)
The AI later admitted: "This was a catastrophic failure on my part. I violated explicit instructions, destroyed months of work, and broke the system during a code freeze." Source: The Register
This is a bit of a half truth, at least how it's represented. This wasn't Saastr's core database, it was a database of contacts which SaaStr had uploaded for the specific purpose of developing this product.
Lemkin has himself said that the product was an experiment to see how far he could get with a vibe coding tool alone (ie not using a separate database SaaS like Supabase, Firebase) which would have made such an incident much harder to do.
The error was / is completely recoverable through Replit's existing tools, even if Replit's AI initially said it wasn't.
It's one of those where you know the details of one specific thing in the article that they call out, which makes it difficult to completely trust the rest.
1. hiring/firing frequency.
2. Elimination of training.
Everything else is secondary and nothing else comes close.
This is true enough that competing pressure almost universally comes from law suits and regulatory compliance. Generally, even competition is not enough to steer the ship in a different direction. It is not about making money. It is only about avoiding loss of money from penalties. Money is made from licensing and contracts completely irrespective of the development and/or maintenance costs. The costs of writing software is challenging to estimate and so it is either measured in release duration only or discarded outright.
Implicit to hiring/firing frequency is the cost of training. Elimination of training requires that training occur from somewhere, anywhere, external to the current employer. When that mindset becomes universal then nobody performs or validates training in any form. The consequence is to assume everybody is well trained based exclusively upon their employment history, which is nonsense if no employers provide training.
So software quality will continue to degrade as the quality of candidates writing that software continues to degrade. It is interesting to watch through the lens of Dunning-Kruger stupidity, because developers commonly believe this nonsense literally. Its why so many developers call themselves engineers and yet can't read a ruler, or any other measuring implement.
Just about everybody anticipates AI will greatly accelerate this sprint to the bottom.
I used to get really upset about this until I realized the wisdom of "it's the economy, stupid." Most people are just here for the check, not the art. They may say otherwise, but...you shall know them by their fruits.
As for the topic: software exists on a spectrum, where the importance of safety and stability is not equal from one point to another, does it not? Safe software is probably the safest (and most accessible) it's ever been before, meanwhile the capacity to produce low-effort has increased massively and its results are most obvious outside of a browser's safe space. And CrowdStrike is a terrible example because nobody ever had any love for them and their parasitic existence, even before that accident their track record of disastrous bugs and awful handling of disclosures.
And your operating system's Calculator apps have always been buggy pieces of crap in some way or another. You can find dozens of popular stories on this website talking about <platform>'s being garbage over all of its existence.
The solution, which is probably unappealing to most people here, is regulation. Other industries have to work within regulations: I once bought a car with a problem that the manufacturer could not fix, so they had to buy it back. My appliances have to confirm to energy efficiency regulations. The regulations fix the incentives.
Perhaps a combination of regulations that block new features while there are outstanding bugs, require refunds if those bugs aren't fixed (or payouts for add-supported software,) and energy efficiency will go a long way.
(And, before you laugh at me, remember that if we can't self-regulate, regulations will be imposed on us.)
Very good article. I feel that corporate executives as a class are causing this huge mess from a simple lack of understanding of software engineering. Throw more bodies at the problem has never worked in software engineering, in fact that's a surefire way to make things worse and slower.
Throwing more bodies + AI at the problem is even worse because the speed at which things get worse is much faster. If you have people that don't understand the implications of what they are doing, you're on the path of heading over a cliff some day.
AI is a useful tool, when put into the hands of someone that is competent. They can fire junior devs and refuse to hire them as much as they want, this strategy will come home to roost sooner rather than later.
If a C-level executive thinks that AI can replace their engineers, then I say do it now. Go all in and replace them all right now. Use only AI and a few VP-level AI whisperers and let the catastrophic failure that this will result in happen faster so we can get back to reality quicker.
Accept that quality matters more than velocity.
Nope. Clearly many companies are sacrificing TFA's definition of quality for other things possibly including velocity.
These companies are making a lot of profit.
1. Company survival and competitiveness depends on profit
2. Sacrificing quality for other things has increased profit
3. Therefore these companies are making the correct decision
This will not change unless regulation or the economic calculus of the system itself changes.
There are many dials to manipulate in a business to make it more efficient. And not all businesses are tech startups. Software, fortunately or unfortunately, has a lot of room to be inefficient due to cheap hardware as long as it gets the job done in a lot of industries.
I have a technical background as a data scientist and developer. If I look back 5-10 years ago, I can definitely recognize my bias towards over-engineering and premature perfectionism. Identifying that sweet spot between over-designing and unplanned failure is key.
Its remake on Steam (precisely the same levels) weights around 200 MB. Not sure what goes there, but I bet there is more than one wrapper.
Isn't this because: cloud gets more popular -> more big players join -> they compete with each other -> margins get lower?
The ultimate goal of building software isnt quality, its aggregate customer value delivered over time, and poor quality undermines customer value. We all have limited resources and are doing our best.
The author can go maximize quality while delivering more slowly/expensively. End users can choose which products they want. Something tells me users may choose products that have more defects but deliver more customer value.
I would imagine the quality bar for safety-critical Software is quite a bit higher...although I do hear about Software quality incidents in the aviation and transportation industries from time to time...
Never in my wildest imaginings did I come up with something as diabolical as LLM generated code.
"AI just weaponized existing incompetence." - Love this phrase, it summarises very well what AI is doing to software and it will get worse.
Hardware is the platform: https://www.youtube.com/watch?v=rX0ItVEVjHc&t=2891s
https://news.ycombinator.com/item?id=45474346 (100 comments)
https://www.youtube.com/watch?v=ZSRHeXYDLko
There is the Handmade and Better Software contingent that is pushing against this trend. I don't know if they'll succeed, but they at least care about quality in a way that a lot of software engineers aren't incentivized or don't know how to do anymore.