Bloom Filters by Example
I probably could have made $100,000,000 if I had made a different choice there.
I wrote a hashcash implementation on a GPU 10 years ago. Pretty sure it's valueless now.
I'm quite sure I wrote one of the first GPU optimized bioinformatics toolkits ... out of curiosity, on a GeForce 8, using CUDA v1(!) back in 2009.
Then went on to do other things ... missed the big bux.
For sets that are plausibly sometimes going to be small where you're going to do a lot of membership checks, you can speculatively add a 64 bit bloom filter with a trivial hash function.
This sounds really stupid, but the cost of doing this is so small you can do it as a gamble. If it doesn't work out you've added like 10ns to your insertions and membership checks, but when it does work out, you can save an incredible amount of work.
- querySelector() in certain cases
- Prefiltering hash lookups in CSS buckets
- Rapid reject of elements when looking for certain Aria attributes (for accessibility)
It's surprising that such tiny filters (32 or 64 bits) work at all, but they often do. There are also some larger Bloom filters around.(I added some of these)
It appears that perfect hash is the one that works the best for my use case.
But if you put things into the perfect hash function it is not expecting, some fraction of them will collide.
If you're searching for a fixed set, look at the Ragel library. Compile-time generation of the search in a way that is very hard to beat.
All three concepts are key in understanding operations of complex distributed systems.
It will demonstrate why the answer is always "maybe it's in the set!" instead of "yes".
After being aware of bloom filters for a while it was quite satisfying to organically find a real use for one that was such an improvement.
This one compares bloom and coockoo filters, so it adds a little more information.
There is a only bloom filter calculator that lets you just mess arpund with parameters
When you add a string, you can see that the bits at the index given by the hashes are set to 1. I've used the color green to show the newly added ones
because all I saw was the content text of "Your set:" changing, no bits, no colors. It took me trying a different browser and then scrolling up and down to realize that it's the visualization two paragraphs above that was changing.