A single line of code cost $8000
screen.studio is macOS screen recording software that checks for updates every five minutes. Somehow, that alone is NOT the bug described in this post. The /other/ bug described in this blog is: their software also downloaded a 250MB update file every five minutes.
The software developers there consider all of this normal except the actual download, which cost them $8000 in bandwidth fees.
To re-cap: Screen recording software. Checks for updates every five (5) minutes. That's 12 times an hour.
I choose software based on how much I trust the judgement of the developers. Please consider if this feels like reasonable judgement to you.
their software also downloaded a 250MB update file every five minutes
How on earth is a screen recording app 250 megabytes
For me that would also be wrong, if I cannot disable it in the configuration. I do bot want to extend startup time.
It's a whole new world out there.
A 250MB download should be opt-in in the first place
I've read on HN that a lot of people have 10Gb Ethernet at home. /s
I work with developers in SCA/SBOM and there are countless devs that seem to work by #include 'everything'. You see crap where they include a misspelled package name and then they fix it by including the right package but not removing the wrong one!.
Unless is was absolutely critical the server have as small as a footprint as humanly possible and it was absolutely guaranteed there would never need to be included in the future of course. However, that first constraint is the main one.
So yes, it’s insane, but easy to see where the size comes from.
Also webapps are just great nowadays most OS support install PWA's fairly decently no?
ffs
For example, on Linux, it uses WebKitGTK as the browser engine, which doesn't render the same way Chrome does (which is the web view used on Windows), so multi-platform support is not totally seamless.
Using something like Servo as a lightweight, platform-independent web view seems like the way forward, but it's not ready yet.
Tauri is not as platform agnostic as Electron is
Found this a few months ago: https://gifcap.dev/
Screen recording straight from a regular browser window, though it creates GIFs instead of video files. Links to a git repo so you can set it up locally.
Tauri is not as platform-agnostic as Electron
I suspect the real reason electron got used here is that ChatGPT/Copilot/whatever has almost no Tauri example code in the training set, so for some developers it effectively doesn't exist.
I imagine if you stick to desktop the situation is less awful but still
Also webapps are just great nowadays most OS support install PWA's fairly decently no?
I would say no, and some are actively moving away from PWA support even if they had it before.
Plus, electron et al let you hook into native system APIs whereas a PWA cannot, AFAIK.
their software also downloaded a 250MB update file every five minutesHow on earth is a screen recording app 250 megabytes
How on earth is a screen recording app on a OS where the API to record the screen is built directly into the OS 250 megabytes?
It is extremely irresponsible to assume that your customers have infinite cheap bandwidth. In a previous life I worked with customers with remote sites (think mines or oil rigs in the middle of nowhere) where something like this would have cost them thousands of dollars per hour per computer per site.
517M ─┬ Screen Studio.app 100%
517M └─┬ Contents 100%
284M ├─┬ Resources 55%
150M │ ├── app.asar 29%
133M │ └─┬ app.asar.unpacked 26%
117M │ ├─┬ bin 23%
39M │ │ ├── ffmpeg-darwin-arm64 8%
26M │ │ ├── deep-filter-arm64 5%
11M │ │ ├─┬ prod 2%
10.0M │ │ │ └── polyrecorder-prod 2%
11M │ │ ├─┬ beta 2%
10.0M │ │ │ └── polyrecorder-beta 2%
10.0M │ │ ├── hide-icons 2%
9.9M │ │ ├─┬ discovery 2%
8.9M │ │ │ └── polyrecorder 2%
5.6M │ │ └── macos-wallpaper 1%
16M │ └─┬ node_modules 3%
10M │ ├─┬ hide-desktop-icons 2%
10.0M │ │ └─┬ scripts 2%
10.0M │ │ └── HideIcons 2%
5.7M │ └─┬ wallpaper 1%
5.7M │ └─┬ source 1%
5.6M │ └── macos-wallpaper 1%
232M └─┬ Frameworks 45%
231M └─┬ Electron Framework.framework 45%
231M └─┬ Versions 45%
231M └─┬ A 45%
147M ├── Electron Framework 29%
57M ├─┬ Resources 11%
10.0M │ ├── icudtl.dat 2%
5.5M │ └── resources.pak 1%
24M └─┬ Libraries 5%
15M ├── libvk_swiftshader.dylib 3%
6.8M └── libGLESv2.dylib 1%
The server can return an override backoff so the server can tell the client how often or how quickly to retry.
It’s nice to have in case some bug causes increased load somewhere, you can flip a value on the server and relieve pressure from the system.
Plenty of things (like playstation's telemetry endpoint, for one of many examples) just continually phones home if it can't connect.
The few hours a month of playstation uptime shows 20K dns lookups for the telemetry domain alone.
Yes, even in metropolitan areas in developed countries in 2025.
Apparently such service is still somehow available; I found https://www.dialup4less.com with a web search. Sounds more like a novelty at this point. But "real" internet service still just doesn't work as well as it's supposed to in some places.
In point of fact, I can fairly reliably download at that rate (for example I can usually watch streaming 1080p video with only occasional interruptions). The best case has been over 20Mbit/s. (This might also be partly due to my wifi; even with a "high gain" dongle I suspect the building construction, physical location of computer vs router etc. causes issues.)
MS confirmed the bug but did nothing to fix it.
They are building features right now. There are a lot of bugs which Microsoft will never fix, or it fixes them after years. (Double click registered on mouse single clicks, clicking "x" to close the window, closes also the window underneat, GUI elements rendered as black due to monitor not recognized etc).
A while ago I did some rough calculations with numbers Microsoft used to brag about their telemetry, and it came out to around 10+ datapoints collected per minute. But probably sent in a lower frequency.
I also remember them bragging about how many million seconds Windows 10 users used Edge and how many pictures they viewed in the Photo app. I regret not having saved that article back then as it seems they realized how bad that looks and deleted it.
In this case, that means an update should have been sent by some kind of web socket or other notification technology.
Today no OS or software that I'm aware of does that.
Keeping a TCP socket open is not free and not really desirable.
Your app can also be ready to receive notifications even when the app isn't running - using zero RAM. Inetd on Linux allows similar stuff (although no ability to handle ip changes or traverse NAT makes it fairly useless in the consumer world).
This stuff is important because polling dominates power use when idle - especially network polling which generally requires hundreds of milliseconds of system awakeness to handle tens of network packet arrivals simply for a basic http request.
Did you know, a typical android phone, if all polling is disabled, has a battery life of 45 days?
It's actually required by the qualification process for lots of carriers. The built in apps have pretty much no polling for this reason.
During the qualification test, it's actually connected to both LTE and WiFi, but not actually transferring any data.
They cheat a little - the phone is not signed into a Google account, which makes pretty much all Google apps go idle.
Just poll every launch or 24 hours and move on.
Turns out Adobe's update service on Windows reads(and I guess also writes) about 130MB of data from disk every few seconds. My disk was 90%+ full, so the usual slowdown related to this was occurring, slowing disk I/O to around 80MB/s.
Disabled the service and the issues disappeared. I bought a new laptop since, but the whole thing struck me as such an unnecessary thing to do.
I mean, why was that service reading/writing so much?
So yes it should only be once a day (and staggered), but on the other hand it's a pretty low-priority issue in the grand scheme of things.
Much more importantly, it should ask before downloading rather than auto-download. Automatic downloads are the bane of video calls...
There are plenty of shitty ISPs out there who would charge $$ per gigabyte after you hit a relatively small monthly cap. Even worse if you're using a mobile hotspot.
I would be mortified if my bug cost someone a few hundred bucks in overages overnight.
We will stop filling your drives with unwanted windows 14 update files to you once you agree the windows 12 and 13 eulas and promise to never ever disconnect from the internet again.
Add special signals you can change on your server, which the app will understand, such as a forced update that will install without asking the user.
I don't like that part neither.
Screen Studio is a screen recorder for macOS. It is desktop app. It means we need some auto-updater to allow users to install the latest app version easily.
No, it doesn't mean that.
Auto updater introduced series of bad outcomes.
- Downloading update without consent, causing traffic for client.
- Not only that, the download keeps repeating itself every 5 minutes? You did at least detect whether user is on metered connection, right... ?
- A bug where update popup interrupts flow
- A popup is a bad thing on itself you do to your users. I think it is OK when closing the app and let the rest be done in background.
- Some people actually pay attention to outgoing connections apps make and even a simple update check every 5 minutes is excessive. Why even do it while app is running? Do on startup and ask on close. Again some complexity: Assume you're not on network, do it in background and don't bother retrying much.
- Additional complexity for app that caused all of the above. And it came with a price tag to developer.
Wouldn't app store be perfect way to handle updates in this case to offload the complexity there?
Thinking of it, the discussed do-it-yourself update checking is so stupid that malice and/or other serious bugs should be assumed.
malice and/or other serious bugs should be assumed
Going back to the blog post and re-reading it with this possibility in mind is quite a trip.
It turns out thousands of our users had the app running in the background, even though they were not using it or checking it for weeks (!). It meant thousands of users had auto-updater constantly running and downloading the new version file (250MB) over and over again every 5 minutes
This could easily have been data exfiltration from client computers instead, and few (besides the guy whose internet contract got cancelled for heavy traffic) would have noticed.
Screen Studio has 32k followers, lets say 6% are end users, 2000 users at $229, that is $137k in App Store fees.
I am going to say writing your own app update script is a wash time wise, as getting your app published is not trivial, especially for an app that requires as many permissions as screen studio.
If you’re a small shop or solo dev, it is real hard to justify going native on three platforms when electron gives it for (near) free. And outside of HN, no one seems to blink at a 250MB bundle.
There are alternatives like Tauri that use the system browser and allow substantially smaller bundles, but they’re not nearly as mature as Electron, and you will get cross platform UI bugs (some of which vary by user’s OS version!) from the lack of standardization.
And outside of HN, no one seems to blink at a 250MB bundle.
I can remember when I would have to leave a 250MB download running overnight.
Before that, I can remember when it would have filled my primary hard drive more than six times over.
... Why can't the app-specific code just get plugged into a common, reusable Electron client?
Tauri is an alternative framework that uses whatever web view the OS provides, saving ~200mb bundle size. On Mac that’s a (likely outdated) version of Safari. On Windows it’ll be Edge. Not sure what Linux uses, I’d guess it varies by distro.
The promise of Electron (and it’s an amazing value prop) is that your HTML/JS UI will always look and work the same as in your dev environment, no matter what OS the host is running.
I don’t have the time or inclination to test my app on the most recent 3 releases of the most popular operating systems every time I change something in the view layer. With Electron, I trade bundle size for not having to do so.
I do think alternatives like Tauri are compelling for simple apps with limited UI, or where a few UI glitches are acceptable (e.g. an internal app). Or for teams that can support the QA burden.
And outside of HN, no one seems to blink at a 250MB bundle.
Please, many people connect to the internet via a mobile phone hotspot, at least occasionally.
This bug would likely cause you to go through your entire monthly data in a few hours or less.
You should probably not roll your own auto-updater.
If you do, checking every 5 minutes for updates is waaaay too often (and likely hurts battery life by triggering the radio).
And triggering a download without a user-prompt also feels hostile to me.
The app size compounds the problem here, but the core issue is bad choices around auto-updating
And outside of HN, no one seems to blink at a 250MB bundle.
Except like 1 or maybe 2 billion people with slow or expensive internet.
I’d actually seen this project before because the author did a nice write up on using React portal to portal into electron windows[1], which is something I decided to do in my app.
I’d just assumed his was a cross platform project.
1: https://pietrasiak.com/creating-multi-window-electron-apps-u...
That was a thing I thought was missing from this writeup. Ideally you only roll up the update to a small percent of users. You then check to see if anything broke (no idea how long to wait, 1 day?). Then you increase the percent a little more (say, 1% to 5%) and wait a day again and check. Finally you update everyone (who has updates on)
Wouldn't app store be perfect way to handle updates
But then the HN crowd would complain "why use an app store? that's gate keeping, apple could remove your app any day, just give me a download link, and so on..."
You literally can't win.
The number of times I have caught junior or even experienced devs writing potential PII leaks is absolutely wild. It's just crazy easy in most systems to open yourself up to potential legal issues.
The context it makes the most sense is accepting code from strangers in a low trust environment.
The alternative to trying to prevent mistakes is making it easy to find and correct them. Run CI on code after it’s been merged and send out emails if it’s failed. At the end of a day produce a summary of changes and review them asynchronously. Use QA, test environments, etc.
Code reviews kill velocity - introduce context switching, and are make work
This is the same point three times, and I don't agree with it. This is like saying tests kill velocity, there's nothing high velocity about introducing bugs to your code base.
Everything introduces context switching, there's nothing special about code reviews that makes it worse than answering emails, but I'm not going to ignore an important email because of "context switching."
Everyone makes mistakes, code reviews are a way to catch those. They can also spread out the knowledge of the code base to multiple people. This is really important at small companies.
CI is great, but I have yet to see a good CI tool that catches the things I do.
This is the same point three times
No it isn’t. Fake work, synchronization, and context switching are all separate problems.
code reviews are a way to catch those
I said you can do reviews - but there is no reason to stop work to do them.
Why not require two or three reviews if they are so helpful at finding mistakes?
I agree everyone makes mistakes - that’s why I would design a process around fixing mistakes, not screening for perfection.
How many times have you gone back to address review comments and introduced a regression because you no longer have the context in your head?
No it isn’t. Fake work, synchronization, and context switching are all separate problems
Context switching is a problem because it...kills velocity. Fake work is a problem because it kills velocity. You're saying it's time that could be better spent elsewhere, but trying to make it sound wider. I disagree with the premise.
Synchronization is a new word, unrelated to what you originally wrote.
How many times have you gone back to address review comments and introduced a regression because you no longer have the context in your head?
Never? I am not unable to code in a branch after a few days away from it. If I were, I would want reviews for sure! Maybe you have had reviews where people are suggesting large, unnecessary structural changes, which I agree would be a waste of time. We're just looking for bug fixes and acceptably readable code. I wouldn't want reviewers opining on a new architecture they read about that morning.
Synchronization is a new word, unrelated to what you originally wrote.
I believe you can figure it out.
Never?
Ok well I’m trying to talk to people who have that problem. Because I and my team do.
Why not require two or three reviews if they are so helpful at finding mistakes?
For secure software, e.g. ASIL-D, you will absolutely have a minimum 2 reviewers. And that’s just for the development branch. Merging to a release branch requires additional sign offs from the release manager, safety manager, and QA.
By design the process slows down “velocity”, but it definitely increases code quality and reduces bugs.
Why not require two or three reviews if they are so helpful at finding mistakes?
Places do? a lot of opensource projects have the concept of dual reviews, and a lot of code bases have CODEOWNERS to ensure the people with the context review the code, so you could have 5-10 reviewers if you do a large PR
Why not require two or three reviews if they are so helpful at finding mistakes?
Diminishing returns, of course. I have worked places where two reviews were required and it was not especially more burdensome than one, though.
I catch so many major errors in code review ~every day that it's bizarre to me that someone is advocating for zero code review.
Code reviews kill velocity
This feels like a strange sense of priorities which would be satirised in a New Yorker/Far Side single-panel comic: “Sure, my mistake brought down the business and killed a dozen people, but I’m not sure you appreciate how fast I did it”.
Code should be correct and efficient. Monkeys banging their heads against a keyboard may produce code fast, but it will be brittle and you’ll have to pay the cost for it later. Of course, too many people view “later” as “when I’m no longer here and it’s no longer my problem”, which is why most of the world’s software feels like it’s held together with spit.
would be satirised in a New Yorker/Far Side single-panel comic:
Thanks for taking my experience and comment seriously and challenging your preconceptions.
Code should be correct and efficient.
When it ships to customers. The goal is to find the bugs before then. Having a stable branch can be accomplished in many ways besides gating each merge with a review.
Do you have any studies to show how effective synchronous code review is in preventing mistakes? If they are such a good idea why not do 2 or 3?
Thanks for taking my experience and comment seriously and challenging your preconceptions.
I apologise if my comment read as mean. I wanted to make the joke and it may have overshadowed the point.
Do you have any studies to show how effective synchronous code review is in preventing mistakes?
I could’ve been clearer. I’m not advocating for code reviews, I’m advocating for not placing “velocity” so high on the list of priorities.
If they are such a good idea why not do 2 or 3?
This argument doesn‘t really make sense, though. You’ve probably heard the expression “measure twice, cut once”—you don’t keep measuring over and over, you do it just enough to ensure it’s right.
The purpose of such a review is a deliberate bottleneck in the earlier stage of development to stop it becoming a much larger bottleneck further down the line. Blocking one PR is a lot cheaper than blocking an entire release, and having a human in the loop there can ensure the change is in alignment in terms of architecture and engineering practices.
CI/CD isn’t the only way to do it but shifting left is generally beneficial even with the most archaic processes.
The up-front cost of code review can be easily be tripled or quadrupled when it’s distributed over several weeks
You’re taking a more extreme position than the one I’m stating. You can review every day or every hour if you want.
a deliberate bottleneck in the earlier stage
Wouldn’t it be better if we could catch bugs AND avoid the bottleneck? That’s the vision. Good intentions may disagree about how to accomplish that.
Like it or not you still have to stop what you’re doing to identify a bug and then fix it, which takes time away from planned feature work. You’re not optimising anything, you’re just adding fragility to the process.
As I said before, an issue localised to a PR in review blocks one person. An issue that has spread to staging or prod blocks the entire team.
Code reviews kill velocity
Yes, they kill your velocity. However, the velocity of a team can be massively increased by shipping small things a lot more often.
Stable branches that sit around for weeks are the real velocity killer, and make things a lot more risky on deployment.
The website makes it seem like it's a one person shop.
If you're not confident you can review a piece of code you wrote and spot a potentially disastrous bug like the one in OP, write more tests.
1) Emergency update for remote exploit fixes only
2) Regular updates
The emergency update can show a popup, but only once. It should explain the security risk. But allow user to decline, as you should never interrupt work in progress. After decline leave an always visible small warning banner in the app until approved.
The regular update should never popup, only show a very mild update reminder that is NOT always visible, instead behind a menu that is frequently used. Do not show notification badges, they frustrate people with inbox type 0 condition.
This is the most user friendly way of suggesting manual updates.
You have to understand, if user has 30 pieces of software, they have to update every day of the month. That is not a good overall user experience.
You have to understand, if user has 30 pieces of software, they have to update every day of the month. That is not a good overall user experience.
That's not an user issue tho, it's a "packaging and distribution of updates" issue which coincidentally has been solved for other OS:es using a package manager.
How the times have changed ..
The "send and receive" button is seared into my brain
I was in Spain at the time, and at first you had to connect to the Internet through a phone number in France.
Did you guys have something like that?
However on BBS days was much worse, it was mostly long distace calls to someone around the country, and they usually only had a couple of connections available like five or so.
Ah another thing is that they adopted the same model as mobile phones, so at least we could pre-pay the calls, and went we run out of cash there was it, no surprise bills, even if frustated.
It is sort of fun (for $8,000) as it was “just” a screenshotter, but imagine this with bank app or any other heavily installed app.
All cloud providers should have alerts for excessive use of network by default. And they should ask developers if they really want to turn alerts off.
I remember Mapbox app that cost much more, just because provider did charge by months… and it was a great dispute who’s fault it was…
Their users do not care about their screen recording studio anywhere near as much as the devs who wrote it do.
Once a month is probably plenty.
Personally, I disable auto-update on everything wherever possible, because the likelihood of annoying changes is much greater than welcome changes for almost all software I use, in my experience.
If the update interval had been 1 day+, they probably wouldn't have noticed after one month when they had a 5 minute update check interval.
This could have easily been avoided by prompting the user for an update, not silently downloading it in the background... over and over.
Once a day would surely be sufficient.
Data centers are big and scary, no body wanted to run their own. The hypothetical cost savings of firing half the IT department was too good to pass up.
AWS even offered some credits to get started, first hit's free.
Next thing you know your AWS spend is out if control. It just keeps growing and growing and growing. Instead of writing better software, which might slow down development, just spend more money.
Ultimately in most cases it's cheaper in the short term to give AWS more money.
Apart of me wants to do a 5$ VPS challenge. How many users can you serve with 5$ per month. Maybe you actually need to understand what your server is doing ?
I'm talking non sense, I know.
Instead of writing better software, which might slow down development, just spend more money.
Except this is unironically a great value proposition.
But, on the AWS marketplace I can click a button, a line item is added to our bill, and infosec are happy because it’s got the AWS checkmark beside it. Doesn’t matter what it costs, as long it goes through the catalog.
That’s why big companies use AWS.
At my last job, I worked for a vc backed startup. I reached out to our fund, and they put us in touch with AWS, who gave us $100k in credits after a courtesy phone call.
That’s why startups use AWS
OpenStack has been around 15 years powering this idea at scale for huge organizatons, including Wal-Mart, Verizon, Blizzard and more.
outrageous combination of inexperience
Correction--many have years of inexperience. Plenty of people that do things like this have "7 years designing cloud-native APIs".
Every problem can be solved by just giving AWS another $100,000 this month because we don't have time (and don't know how) to make even basically optimized software.
Don't forget the Java + Kafka consultants telling you to deploy your complicated "micro-service" to AWS and you ending up spending tens of millions on their "enterprise optimized compliant best practice™" solution which you end up needing to raise money every 6 months instead of saving costs as you scale up.
Instead, you spin up more VMs and pods to "solve" the scaling issue, which you lose even more money.
It is a perpetual scam.
Good way of showing adoption and growth.
Nobody under any circumstances needs usage stats with 5 minute resolution. And certainly not a screen recorder.
Once a day would surely be sufficient.
Weekly or monthly would be sufficient. I'd also like "able to be disabled manually, permanently" as an option, too.
There are only a few applications with exposed attack surface (i.e. accept incoming requests from the network) and a large enough install base to cause "massive damage all of the Internet". A desktop screen recorder app has no business being constructed in a manner that's "wormable", nor an install base that would result in significant replication.
The software that we need the "average user" to update is stuff like operating systems. OS "manufacturers" have that mostly covered for desktop OS's now.
Microsoft, even though their Customers were hit with the "SQL Slammer" worm, doesn't force automatic updates for the SQL Server. Likewise, they restrict forcing updates only to mainstream desktop OS SKUs. Their server, embedded, and "Enterprise" OS SKUs can be configured to never update.
Once a day would surely be sufficient.
Well they might need to rush out a fix to a bug that could be harmful for the user if they don't get it faster.
For example, a bug that causes them to download 250MB every 5 minutes.
Just amazed. Yea ‘write code carefully’ as if suggesting that’ll fix it is a rookie mistake.
So so frustrating when developers treat user machines like their test bed!
After I shipped a bug the Director of Engineering told me I should "test better" (by clicking around the app). This was about 1 step away from "just don't write bugs" IMO.
TBH, that was well done for what it was but really called for automation and lacked unit-testing.
Although, after such a fuck up, I would be tempted to make a pre-release check that tests the compiled binary, not any unit test or whatever. Use LD_PRELOAD to hook the system timing functions(a quick google shows that libfaketime[0] exists, but I've never used it), launch the real program and speed up time to make sure it doesn't try to download more than once.
Then it's a unit test that looks too obvious to exist until you read the ticket mentioned in the comment above it
No need for monkey patching or hooking or preload
But before that you add a couple checkmarks to the manual pre-release test list: "1 hour soak test" and "check network transfer meters before and after, expect under 50 MB used in 1 hour (see bug #6969)"
In Linux they're under /sys/class/net I think
From TFA: "Write your auto-updater code very carefully. Actually, write any code that has the potential to generate costs carefully." So the focus is on code that "generate[s] costs". I think this is a common delusion programmers have; that some code is inherently unrelated to security (or cost), so they can get lazy with it. I see it like gun safety. You have to always treat a gun like it's loaded, not because it always is (although sometimes it may be loaded when you don't expect it), but because it teaches you to always be careful, so you don't absent-mindedly fall back into bad habits when you handle a loaded one.
Telling people to write code carefully sounds simplistic but I believe for some people it's genuinely the right advice.
Avoidable, unfortunate, but the cost of slowing down development progress e.g. 10% is much higher.
But agree that senior gatekeepers should know by heart some places where review needs to be extra careful. Like security pitfalls, exponential fallback of error handling, and yeah, probably this.
What did the CEO think of this?
I doubt there’s a CEO. Despite the use of “we”, pretty sure this is one guy building the app. All the copyright notices and social media go back to one person.
But agree that senior gatekeepers should know by heart some places where review needs to be extra careful. Like security pitfalls, exponential fallback of error handling, and yeah, probably this.
The lesson here is much better use of automated tests (The app likely has no tests at all) and proper use of basic testing principles like TDD would prevent such junior-level embarrassing bugs creeping up in production paid software.
That is the difference between a $100 problem vs a $200M problem.
See the case of Knight Capital[0] who lost $460M, due to a horrific deploy.
[0] https://www.henricodolfing.com/2019/06/project-failure-case-...
On one hand it's good that the author owns up to it, and they worked with their users to provide remedies. But so many things aren't adding up. Why does your screen recorder need to check for updates every 5 minutes? Once a day is more than enough.
This screams "We don't do QA, we shorts just ship"
Or, given it's a Mac app, just have the Mac app store take care of updates. That's part of the value that using the app store service gives you,
And pay Apple their 30% cut on your revenue? No thanks.
the other one being not spending thousands in accidental data transfer when you do auto updates wrong.
Or just actually write proper automated tests for basic features first, before a large refactor to prevent introducing issues like this from happening again?
While I respect the author's honesty in this mistake, the main takeaway here is not mentioned and that is just writing proper automated tests as their impression on this post is that there aren't any.
It was good enough for netflix etc.
*I* don't want applications to be able to update itself. Look at malware zoom for example.
It's funny that people don't like telemetry, but at the same time they're ok with regular software update checks + installs.
It's just tricky, basically one fat edge case, and a critical part of your recovery plan in case of serious bugs in your app.
(This bug isn't the only problem with their home-grown updater. Checking every 5 min is just insane. Kinda tells me they aren't thinking much about it.)
You can use whatever you want outside of the App Store - most will use Sparkle to handle updates https://sparkle-project.org/. I presume Windows is similar.
Apple doesn't "know best" - it's just that that is what the system package manager is.
The fact that that is what the system package manager is is why I said Apple "knows best". You can pick from dozens of system packages managers hooked up to hundreds, if not thousands of different repos on Linux.
I'm pretty conservative about adopting third-party libraries (due to the long-term issues each one has the potential to cause), but an app updater is probably worth it.
Especially for a Mac-only application where Sparkle (https://sparkle-project.org/) has been around for almost two decades now and has been widely used across all sorts of projects to the point that it's a de facto standard. I'd be willing to bet that almost every single Mac "power user" on the planet has at least one application using Sparkle installed and most have a few.
We used Sparkle, https://sparkle-project.org/, to do our updates. IMO, it was a poor choice to "roll their own" updater.
Our application was very complicated and shipped with Mono... And it was only about ~10MB. The Windows version of our application was ~2MB and included both 32-bit and 64-bit binaries. WTF are they doing shipping a 250MB screen recorder?
So, IMO, they didn't learn their lesson. The whole article makes them look foolish.
The Windows version of our application was ~2MB and included both 32-bit and 64-bit binaries. WTF are they doing shipping a 250MB screen recorder?
Electron.
So, IMO, they didn't learn their lesson. The whole article makes them look foolish.
The lesson is to do better testing and write automated tests and don't roll your own updater.
What might be fun is figuring out all the ways this bug could have been avoided.
Another way to avoid this problem would have been using a form of “content addressable storage”. For those who are new, this is just a fancy way of saying make sure to store/distribute the hash (ex. Sha256) of what you’re distributing and store it on disk in a way that content can be effectively deduplicated by name.
It’s probably not so easy as to make it a rule, but most of the time, an update download should probably do this
out all the ways this bug could have been avoided.
The most obvious one is setting up billing alerts.
Past a certain level of complexity, you're often better off focusing on mitigation that trying to avoid every instance of a certain kind of error.
https://en.m.wikipedia.org/wiki/Knight_Capital_Group#2012_st...
440m usd
The url specifically asks Wikipedia to serve the mobile site.
In the grand scheme of things, $8k is not much money for a business, right? Like we can be pretty sure nobody at Google said “a-ha, if we don’t notify the users, we will be able sneak $8k out of their wallets at a time.” I think it is more likely that they don’t really care that much about this market, other than generally creating an environment where their products are well known.
Curious where the high-water mark is across all HNers (:
Our team had a bug that cost us about $120k over a week.
Another bug running on a large system had an unmeasurable cost. (Could $K, could be $M)
As a designer, I value the experience product I create provides to the users. And this was not even a bad experience; it was actually harmful.
$229 per year on a closed source product and this is the level of quality you can expect.
You can have all the respect for users in the world, but if you write downright hazardous code then you're only doing them a disservice. What happened to all the metered internet plans you blasted for 3 months? Are you going to make those users whole?
Learning from and owning your mistake is great and all, but you shouldn't be proud or gloating about this in any way, shape, or form. It is a very awkward and disrespectful flex on your customers.
This is back in the Rails days, before they switch to Scala.
I heard that there was a fail-whale no one could solve related to Twitter's identity service. IIRC, it was called "Gizmoduck."
The engineer who built it had left.
They brought him in for half a day of work to solve the P0.
*Supposedly*, he got paid ~50K for that day of work.
Simultaneously outrageous but also reasonable if you've seen the inside of big tech. The ROI is worth it.
That is all.
Disclaimer: don't know if it's true, but the story is cool.
We decided to take responsibility and offer to cover all the costs related to this situation.
Good on them. Most companies would cap their responsibility at a refund of their own service's fees, which is understandable as you can't really predict costs incurred by those using your service, but this is going above and beyond and it's great to see.
At some scale such careless mistakes are going to create real effects for all users of internet through congestion as well.
If this was not a $8000 mistake but was somehow covered by a free tier or other plan from Google Cloud, would they still have considered it a serious bug and fixed it as promptly?
How many such poor designs are out there generating traffic and draining common resources.
The title should have been: "how a single line of code cost our users probably more than $8000"
Looking at the summary section, I'm not convinced these guys learned the right lesson yet.
A giant ship’s engine failed. The ship’s owners tried one ‘professional’ after another but none of them could figure out how to fix the broken engine.
Then they brought in a man who had been fixing ships since he was young. He carried a large bag of tools with him and when he arrived immediately went to work. He inspected the engine very carefully, top to bottom.
Two of the ship’s owners were there watching this man, hoping he would know what to do. After looking things over, the old man reached into his bag and pulled out a small hammer. He gently tapped something. Instantly, the engine lurched into life. He carefully put his hammer away and the engine was fixed!!!
A week later, the owners received an invoice from the old man for $10,000.
What?! the owners exclaimed. “He hardly did anything..!!!”.
So they wrote to the man; “Please send us an itemised invoice.”
The man sent an invoice that read:
Tapping with a hammer………………….. $2.00
Knowing where to tap…………………….. $9,998.00
That way I guess you get the caching of the DNS network for free, it uses basically one packet each way, encryption is still possible, and it can reduce the traffic greatly if a big org is running a thousand instances on the same network
I think it was written in Go. Might have been Syncthing
If the file contains invalid JS (syntax error, or too new features for IE on Win7/8), or if it's >1MB (Chromium-based browsers & Electron limit), and the file is configured system-wide, then EVERY APP which uses wininet starts flooding the server with the requests over and over almost in an endless loop (missing/short error caching).
Over the years, this resulted in DDoSing my own server and blackholing its IP on BGP level (happened 10+ times), and after switching to public IPFS gateways to serve the files, Pinata IPFS gateway has blocked entire country, on IPFS.io gateway the files were in top #2 requests for weeks (impacting operational budget of the gateway).
All of the above happens with tight per-IP per-minute request limits and other measures to conserve the bandwidth. It's used by 500 000+ users daily. My web server is a $20/mo VPS with unmetered traffic, and thanks to this, I was never in the situation as the OP :)
For those interested in this topic, and how other industries (e.g. Airline industry) deal with learning from or preventing failure: Sidney Dekker is the authority in this domain. Things like Restorative Just Culture, or Field guide to understanding human error could one day apply to our industry as well: https://sidneydekker.com/books.
The relevance is that instead of checking for a change every 5 minutes, the delay wasn't working at all, so the check ran as fast as possible in a tight loop. This was between a server and a blob storage account, so there was no network bottleneck to slow things down either.
It turns out that if you read a few megabytes 1,000 times per second all day, every day, those fractions of a cent per request are going to add up!
Good thing, this was not shopify/Duolingo/Msft, else the news would be, how AI saved us $8k by fixing a dangerous code and why AI will improve software quality.
While refactoring it, I forgot to add the code to stop the 5-minute interval after the new version file was available and downloaded.
I’m sorry but it’s exactly cases like these that should be covered by some kind of test, especially When diving into a refactor. Admittedly it’s nice to hear people share their mistakes and horror stories, I would get some stick for this at work.
Write your auto-updater code very carefully.
You have to be soooo careful with this stuff. Especially because your auto-updater code can brick your auto-updater.
It looks like they didn't do any testing of their auto update code at all, otherwise they would have caught it immediately.
The app checks for the update every 5 minutes or when the user activates the app. Normally, when the app detected the update - it downloaded it and stopped the 5 minutes interval until the user installed it and restarted it.
This is still bad. I was really hoping the bug would have been something like "I put a 5 minute check in for devs to be able to wait and check and test a periodic update check, and forgot to revert it". That's what I expected, really.
It's best to save everyone by writing tests that prevent a $100 issue on your machine from becoming a costly $10M+ problem in production as the product scales after it has launched.
This won't be the last time and this is what 'vibe coding' doesn't consider and it it will introduce more issues like this.
Seriously this alone makes me question everything about this app.
Add special signals you can change on your server, which the app will understand, such as a forced update that will install without asking the user.
I understand the reasoning, but that makes it feel a bit too close to a C&C server for my liking. If the update server ever gets compromised, I imagine this could increase the damage done drastically.
A single line of code caused <BUG>
Yes, a single line of code is in the stack trace every time a bug happens. Why does every headline have to push this clickbait?
All errors occur at a single line in the program - and every single line is interconnected to the rest of the program, so it's an irrelevant statement.
Well, you should hire contractor to set console for you.
"Designed for MacOS", aah don't worry, you will have the money from apes back in the no time. :)
I think that is the essence of what is wrong with the cloud costs. Defaulting to possibility for everyone to scale rapidly while in reality 99% have quite predictable costs month over month.