Litestream: Revamped
Recently, I deployed a little side project using a small postgres vm on fly.io After a couple of days, and only having about 500kb of data stored in that db, the postgres vm went into an unrecoverable fail loop, saying it ran out of memory, restarting, then immediately running out of memory again, so on and so forth
It took about 3-4hrs to recover the data jumping through a lot of hoops to be able to access the data, copy it to another volume and finally download it
I would've reached for support, but it seems like the only option available is just posting on their forum. I saw a couple of related posts, all with unsatisfactory answers unfortunately
To be fair, it was incredibly easy to get up and running with them. On the other hand, almost all the time I saved by that quick start, was wasted recovering the failing db, all the while my site was down
Ironically, I originally developed the project using sqlite, but then switched to postgres to deploy
For Postgres I ended up going with Neon (neon.tech), very happy with them so far. Super easy to setup and get up and running, also love being able to just easily see the data from their web interface
People sometimes have a hard time with the idea that we write about things because they are interesting to us, and for no other reason. That's also 60-70% of why Ben does what he does on Litestream.
I can't wait to try this out.
With the recent addition of CAS to S3
Do you have a reference for this? I assume by CAS you mean content addressable storage? I googled but can't find any AWS docs on this.
Litestream now depends on this functionality to handle multiple writers. Think of optimistic locking.
https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3...
1. A built-in UI and CLI that manages SQLite from a volume. Getting the initial database on a Fly Machine requires more work than it should.
2. `fly console` doesn't work with SQLite because it spins up a separate machine, which isn't connected to the same volume where the SQLite data resides. Instead you have to know to run `fly ssh console —pty`, which effectively SSH's into the machine with the database.
The problem in general with SQLite web apps is they tend to be small apps, so you need a lot of them to make a decent amount of money hosting them.
TBH I wish they had their managed PG cluster running because it would have made it easier to downsize, but I’m happy with SQLite.
I used SQLite for another project that I knew was going to max out at 100 concurrent users and it worked great. The best moment was when a user reported a production error I couldn’t recreate locally, so I downloaded the database and recreated it with the latest production data on my laptop. You couldn’t do that with a high-compliance app, but that’s not most apps.
I’m hesitant to outright say “SQLite and Rails is great”because you have to know your app will run on one node. If you know that then it’s fantastic.
Should make replicating Multi tenant per-user SQLite databases a lot more appealing.
So it's often "presented" as "a local database which is replicated by streaming", but perhaps it would be more natural to view it as a kind of "centralized database, but with local caches that are kept near your server code, and sync'd"
I understand it's the same, but it makes the intent clearer: the intent is more to have a local cache to read/update (which is then sync'd), or at least it seems clearer to me presented that way :)
read replicas
Can this be done with only Litestream, or is LiteVFS still in development? I looked into this last year but was put off by LiteFS's stated write performance penalty due to FUSE[1]; it's still marked as WIP[2] and hasn't seen updates for over a year.
[1] https://fly.io/docs/litefs/faq/#what-are-the-tradeoffs-of-us...
I guess for discovery it kind of makes sense, but wouldn't really be "GitHub for Fossil" but more like a search engine/portal.
n.b. while looking up the modern state of affairs, I found this gem https://fossil-scm.org/home/doc/trunk/www/qandc.wiki#:~:text... which made me chuckle but is also, let's be real, a barrier to adoption. Yes, I am aware of the rabid army of Bugzilla fans, but I'd straight up quit before working with the garbage that is Bugzilla
There clearly still is some untapped potential in this space, so I am glad benbjohnson is exploring and developing these solutions.
Great that the new release will offer the ability to replicate multiple database files.
Modern object stores like S3 and Tigris solve this problem for us: they now offer conditional write support
I hope this won't be a hard requirement, since some S3 compatible storage do not have this feature (yet). I also do use the SFTP storage option currently.
Am I understanding correctly that I will be able to restore a database to any point-in-time that is while the litestream process is running? Because auto-checkpointing could consume the WAL while it isn't running?
So for an extreme example if the process crashed for an hour between 2:00 and 3:00, I could restore to 1:55 or 3:05 but the information required to restore between 2:00 and 3:00 is lost?
Previously this would be problematic, as the new instance might miss changes made by the old server. Is this fixed by these new changes?
When I deploy a new version of my python/sqlite web app, I do not replace the whole machine. I just upgrade the python package and restart the systemd service.
If I wanted to reduce downtime I could probably figure out a transition using SO_REUSEPORT. During the transition the old and new processes would be using the db concurrently, so the app would have to respect that case. If part of the upgrade requires a db schema change then I’m not sure how you could avoid some downtime. But I don’t know if it is possible with traditional dbs either.
Your new service will come up, but it won't be able to get the write lease until the previous server shuts down. Now you have tools to detect this, stop one writer, and start the other, but the service will likely have to experience some kind of requests queueing or downtime.
You might start running into problems at tens or hundreds of thousands of writes a second, though even then you may be OK on the right hardware.
One isn't better than the other. But LiteFS isn't a "distributed SQLite" in the sense you'd think of with rqlite. It's a system for getting read-only replicas, the same way you've been able to do with log shipping on n-tier databases for decades.
we're using it on production for a write-heavy interal use-case (~12GB compressed) for more than a year now; and it's costing us a couple hundred pennies per month (azure).
excited to try the new changes when they land.
I'll be doing a similar deployment later this year and enjoy reading on the topic.
Really nice to see this, I wrote this comment almost 2 years ago when I was a little miffed about trying to use litestream and litefs: https://news.ycombinator.com/item?id=37614193
I think this solves most of the issues? You can now freely run litestream on your DB and not worry about issues with multiple writers? I wonder how the handoff is handled.
The read replica FUSE layer sounds like a real nice thing to have.
edit: Ah, it works like this: https://github.com/benbjohnson/litestream/pull/617
When another Litestream process starts up and sees an existing lease, it will continually retry the lease acquisition every second until it succeeds. This low retry interval allows for rolling restarts to come online quickly.
Sounds workable!
Now that we’ve switched to LTX, this isn’t a problem any more. It should thus be possible to replicate /data/*.db, even if there’s hundreds or thousands of databases in that directory.
That was the show stopper. Now multi tenant with per tenant database whee (in theory) each user can roll back to certain point in time or at least completely download their database and take away for whatever they want to do with it is going to be possible.
It will be able to fetch and cache pages directly from S3-compatible object storage.
Does this mean your SQLite database size is no longer restricted by your local disk capacity?
LiteVFS is a Virtual Filesystem extension for SQLite that uses LiteFS Cloud as a backing store.
Limitations
- Databases with journal_mode=wal cannot be modified via LiteVFS (but can be read)
- Databases with auto-vacuum cannon be opened via LiteVFS at all
- VACUUM is not supported
Databases with journal_mode=wal cannot be modified via LiteVFS (but can be read)
Without modifying SQLite (what the Turso guys did), the WAL index is hard (but not impossible) to share across a network boundary. I'm guessing that's the why here. There's a hack that I'm pretty confident works, but I'm not sure how it behaves under latency (sure enough that I use it for Windows, and it hasn't caused issues running mptest thousands of times in CI over months).
Leaving it here, maybe Ben's interested.
https://github.com/ncruces/go-sqlite3/blob/c780ef16e277274e7...
We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what coding agents like Phoenix.new want is a way to try out code on live data, screw it up, and then rollback both the code and the state.
Prescient.Agents would of course work well if they can go back in time to checkpoints and branch from there, exploring solutions parallely as needed.
Anyone who has experience with building workflows (Amazon SWF, Temporal, and the like) knows how difficult it is to maintain determinism in face of retries & re-drives in multi-tier setups (especially, those involving databases).
Replit recently announced their Agent's integration with Neon's time travel feature[0] for exactly the purpose outlined in TFA. Unlike Fly.io though, Replit is built on GCP and other 3p providers like Neon and it is unclear if both GCP & Databricks won't go all Oracle on them.