-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message ordering #70
Comments
@okdistribute I agree that clock drift is problematic. IRC "solves" this by being realtime and not storing messages, which we don't have the luxury of. This would mean including a vector clock timestamp in each message sent, right? The clock will grow linearly with N, the number of participants in the cabal. I'm concerned about this. A cabal with 1000 users (over its lifetime) and a per-clock size of 4 bytes (32-bit unsigned int), means every cabal message has an additional overhead of ~4kb. A cabal with 10k messages will then contain up to 4mb of vector clock data. Another approach, which hyperlog used (and we use in Mapeo) is to include an array of |
OK thanks @noffle, that makes sense. Does that cause performance tradeoffs for indexing or displaying the most recent messages in order? Just curious as usually disk space & performance have tradeoffs. |
A simple approach to experiment with is using a single monotonically increasing clock for all users. When you post a message, give it the current time or the highest time you've seen in any other message (plus one), whichever is higher. Sorting messages by this value puts them into one of the possible causal orderings. However, a bad actor can post a very high timestamp and cause everyone trouble with integer overflow. Maybe Cabal users trust each other enough to not do that? Or maybe it's possible to ignore messages with times too far in the future. This does not give the same richness of causal information as the backlinks approach described by @noffle . That extra info might enable a better user experience when old messages arrive? |
Oops, I think I reinvented Lamport clocks. 😬 I guess it's time for me to read some theory... |
@okdistribute compared to vector clocks, which you can easily write a
sort function for, the links approach would require an index to do the
sorting. Once sorted though, querying would be the same as it is
currently (a leveldb range query).
I'm not sure what the algorithm would be for this. In Mapeo we'd iterate
over historic records by literally following the links, which would
probably be much too slow for displaying a list of messages in realtime.
|
when i was in berlin, peg linked me a paper on bloom clocks. i haven't done any deep reading of the paper, so i can't really say anything about tradeoffs, but i imagine it solves the space-efficiency problem of pure vector clocks. here's also a really nice tutorial on bloom filters peg had lying around paper abstract
from wikipedia:
that said i've never heard of bloom filters being used productively lol |
This is all useful information and the indexing/sorting problem makes me think it might be worth the extra 4mb in the case that a group gets to 1000k+ people to reduce sorting time on messages. Implementing a bloom clock/filter could be cool but seems like more work |
It was 1k users @ 10k messages. So even with a modest number of users, the cabal's size will grow much faster than a timestamping method that is independent of the number of participants. Since this is a db change and not just a protocol change, it will be harder to modify in the future. @okdistribute have you noticing this problem with clock drift happening much in cabal? Another solution could be a node using cabal peers as NTP servers. A node could look at the reported system time of its peers, and decide if it wants to offset its own clock to try and match theirs. It wouldn't be a trusted clock, but might be good enough for a medium-trust environment. Another trick, which I learned from writing realtime multiplayer games, is to figure out your peers' clock offset, and apply those offsets locally. The idea is that when connecting to a peer, you share your system time with each other. You can use this to figure out the offset of that peer's clock relative to yours, and apply that to the messages you receive from them. This is cool because it doesn't matter what time anyone's clock is set to: each peer has a subjective relative view! The downside is that you'd only know offsets for peers you can directly connect to. There might be some kind of scheme where you receive transitive information about other peers' reported offsets, but I haven't thought that through & there's always security challenges when you're trusting other peers' subjective reports. |
@noffle like I said, I haven't noticed it because I'm always online. It's more of a its-not-a-problem-until-it-is, you can leave it the same and probably nothing bad will happen until someone's messages are sorted at the beginning of time, and no one sees them, but then no one would probably notice To note, that's 4mb if you download the entire history of the cabal, which hopefully wouldn't have to be the case if we have an easy way to request the last X messages (which may be easier if they're sorted by a number!) |
@noffle revisiting this as some folks today noticed their clocks were a couple seconds off from each other while chatting.. XD using cabal peers as NTP servers is an interesting hack for sure! |
Please feel free to disregard if you don't want to go ahead with this, but this is a small tweak that I'd like to propose that I think will improve the reliability of cabal especially for mobile, where clocks are less reliable.
The messages in cabal are currently ordered by 'real' timestamp and also here. This poses an issue for the often case where 'real' clocks are not reliable, if you go offline for 24+hrs or a few days, you'll start to notice your electronics will suffer from 'clock drift' .. or sometimes will just reset randomly to some date in January 2017 or something like that.
An improvement on this is where you take a number and add it to the 'real' clock time, and this number is based upon other numbers you have seen for messages in your log. This is called a vector clock. A simpler implementation can be done as a lamport clock. Just a silly old European way of taking a simple idea and putting someone's name on it, but it's pretty straightforward.
This is a quick watch and a really useful one if you have time: https://www.dotconferences.com/2019/12/james-long-crdts-for-mortals
The benefits for Cabal would be pretty unnoticed for most people especially us, since we are all usually connected to the Internet. But it might also help for messages that come 'back in time' as it would be more clear which exact message before (based on the number), that the person was responding to or messaging after.
Curious what people think about this change?
The text was updated successfully, but these errors were encountered: