There's no single best way to store information
Also, let us not confuse "relative" with "not objective". My father is objectively my father, but he is objectively not your father.
https://scifi.stackexchange.com/questions/270578/negotiator-...
* For lossless compression of generic data, gzip or zstd.
* For text, documentation, and information without fancy formatting, markdown, which is effectively a plain-text superset.
* For small datasets, blobs, objects, and what not, JSON.
* For larger datasets and durable storage, SQLite3.
Whenever there's text involved, use UTF-8. Whenever there's dates, use ISO8601 format (UTC timezone) or Unix timestamps.
Following these rules will keep you happy 80% of the time.
How would you go about storing those in a somewhat human-readable format? My goal is to archive my chats and social media activity.
You can have another table for attachments (images, videos, etc.). If they're small, store them directly in a BLOB. If they're not, store them alongside the database, and only store the relative path in the attachments table.
You may opt to convert images and videos to a single format (e.g. PNG and H.264 MP4), but you can lose information depending on the target format. It may be preferable to leave them in the original (or highest quality) format.
You're reducing definitions and meaning too far to make an ultimately empty point just to contribute the thread.
If social medias only contribution is language policing, then it really should die off. What a waste of resources so functional illiterate nobodies can project ego.
https://en.wikipedia.org/wiki/Data_storage is a different website from https://en.wikipedia.org/wiki/Data_store because they are different, slightly overlapping concepts.
I've been thinking about trade-offs as "pick two of three" in the abstract, but the bookshelf example made it concrete. The insight that matters is: if you know your query patterns, you can optimize differently.
As a PM, I keep trying to build systems that work for "every case." But this article reminded me that's the wrong goal. The hash table works because it accepts the space-time trade-off. The heap works because it embraces disorder for non-priority items.
Sometimes the best system isn't the most elegant one—it's the one that matches how you'll actually use it.
Good reminder to stop over-optimizing for flexibility I'll never need.
Thanks for sharing.
The query itself represents information. If you can anticipate 100% of the ways in which you intend to query the information (no surprises), I'd argue there might be an ideal way to store it.
I'd love to be clued in on more interesting architectures that either attempt to optimize both or provide a more continuous tuning knob between them
(fast/optimal) real-time access to new data
https://en.wikipedia.org/wiki/Optimal_binary_search_tree#Dyn...
Given the domain name, I was expecting something about the physics of information storage, and some interesting law of nature. Instead, the article is a bad introduction to data structures.
Requiring perfect knowledge of how information will be used is brittle. It has the major benefit of making the algorithm design problem tractable, which is why we do it.
An alternative approach is to exclude large subsets of queries from the universe of answerable queries without enumerating the queries that the system can answer. The goal is to qualitatively reduce the computational intractability of the universal case by pruning it without over-specifying the queries it can answer such as in the traditional indexing case. This is approximately what "learned indexing" attempts to do.
A consequence of there being no generally superior storage mechanism is that technologists as a community should have an agreed default standard for storage - which happens to be relational.
I'd say this is spiritually what the no-free-lunch theorems are about... Because whatever "AI model" / query system you build -- it is implicitly biased towards queries coming from one slice of futures.
(Or I guess, more generally, the intended effect is zero correlation between the information and the time it takes to retrieve it. If retrieval time were completely random, it would achieve the goal, but it wouldn't have zero variation.)
I doubt we humans will be able to do better (faster, more capacity, more analytical, more intuitive, more logical) storage (at an individual level, not at mass scale, since that's kinda achieved already by the behemoths like Google, etc.) in a few thousand years of civilization.
Quantum computing may be the game changer though.
I read somewhere that the entirety of humanity's information, including all knowledge and data of past (of every human ever) and current, if stored via quantum computing - that quanta of quantum information will just be the size of a football.
I also like (old) .ini / TOML for small (bootstrap) config files / data exchange blobs a human might touch.
+
Re: PostgreSQL 'unfit' conversations.
I'd like some clearer examples of the desired transactions which don't fit well. After thinking about them in the background a bit I've started to suspect it might be an algorithmic / approach issue obscured by storage patterns that happen to be enabled by some other platforms which work 'at scale' supported by hardware (to a given point).
As an example of a pattern that might not perform well under PostgreSQL, something like lock-heavy multiple updates for flushing a transaction atomically. E.G. Bank Transaction Clearance like tasks. If every single double-entry booking requires it's own atomic transaction that clearly won't scale well in an ACID system. Rather the smaller grains of sand should be combined into a sandstone block / window of transactions which are processed at the same time and applied during the same overall update. The most obvious approach to this would be to switch from a no-intermediate values 'apply deduction and increment atomically' action to a versioned view of the global data state PLUS a 'pending transactions to apply' log / table (either/both can be sharded). At a given moment the transactions can be reconciled, for performance a cache for 'dirty' accounts can store the non-contested value of available balance.
Conceptually similar to CAP, but with storage trade-offs. The idea is you can only pick 2 out of 3.