-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thoughts on improving chat systems #5
Comments
"Oh boy" indeed. Speaking as the Matrix project lead, some of the opinions here on Matrix feel unhelpfully harsh / dated / inaccurate - I wish there'd been the option to discuss prior to publication.
This is like saying "the source of all the problems in git is that you have to merge branches together". There is nothing intrinsically wrong with Matrix's state resolution algorithm. Sure, the initial beta version had bugs which we fixed for 1.0 - and sure, a naive implementation is a performance dog. Just like the beta versions of git were probably unperformant and buggy too. But we spent ages making damn sure that we got the thinkos solved and also did the work to make damn sure that performant implementations are possible. As it happens performance right now is okay enough that we haven't prioritised implementing incremental state res as per that design, but it's not an intrinsic "problem" in the protocol.
We fixed 1760 in matrix-org/synapse#5480 in June. At the time the fix was considered experimental, but having tested it in the wild it actually works fine and we're planning to turn it on by default in Synapse 1.4 (due next week). The only reason it hasn't been turned on earlier was because of an edge case which should land for 1.4.
No, it's not a "somewhat hacky workaround". It's the equivalent of checkpointing a database or incrementally defragmenting a filesystem in order to keep the datastructure in check. We don't "send" dummy events but instead insert them as checkpoint events into the local copy of the room to mitigate fragmentation. I may be biased given I proposed it.
From memory, this was prior to 1.0, where indeed we had some pretty bad performance issues. Since then we have been constantly plugging away at improving performance, which has been slowly but steadily improving ever since. It is excruciatingly frustrating to be judged on bad performance and experience from when we were in beta :(
This is genuinely surprising. I don't believe I have ever experienced a situation on any of the servers I have visibility on (matrix.org, my personal ones, *.modular.im) where messages have not been delivered. The only scenario I know where this could have happened is matrix-org/synapse#2528 - i.e. after an outage, other servers may not retry sending to a previously dead server until the next time someone speaks in the room. Most other protocols (email, activitypub, etc) suffer the same. Is this what happened? If not, please can you point me at the bug you presumably filed so we can investigate? In terms of push notifications breaking: Riot/iOS & Android have both suffered issues in the past where failure to refresh push tokens with the push server (due to server unavailability) could cause the app to sulk and turn off push entirely. However, I believe these failure modes were fixed over a year ago. Again, any further information so we can check that this was fixed would be great. (Otherwise, it'd be great not to be judged on bugs which were fixed ages ago).
...precisely :/
Yes, the pre-1.0 beta had a bug in the state resolution algorithm which we fixed. And yes, the way to fix bugs in a state resolution algorithm is to migrate your data to use a new version of the algorithm. I'm not sure this counts as a 'questionable security story'.
Weirdly enough we don't use our public github issue tracker for serious security issues. The security tag there almost by definition refers to issues which are sufficiently non-impacting that they can be left in public view. Bashing a project based on a stats analysis of its bugtracker ends up telling you a lot more about the bug filing philosophy of the project than its actual bugginess, imo. We deliberately try to put anything & everything in our bugtrackers, to use them as a knowledge repository of all conceivable defects of the system. It's the diametric opposite of something like (say) postgres, where last time I checked they didn't even have a bugtracker. Our internal security bug tracker gives a much clearer view, which you can see the results of in synapse's changelog by grepping for security (or looking at the hall of fame at https://matrix.org/security-disclosure-policy).
I guess it depends on whether you consider conversation history a first class citizen or not. A good analogy is IMAP versus POP. IMAP is a serverside knowledge store you can depend on; POP was an awful hack to queue up your messages so you could get them onto your client. I'd rather use IMAP for my email than POP, and for the same reasons I think Matrix has the right data model here though. That said, if someone wants to do chat over ActivityPub, go wild - we'll just go ahead and bridge it into Matrix :) |
It absolutely is a problem, but that isn't a necessarily negative thing. It's a problem you have spent a lot of time and resources dealing with, and have solved to your satisfaction -- but the fact that you had to spend those resources on that, means that it absolutely is a problem. You said it's as much of a problem as "merging branches in git", but the last time I did real enterprise work I had a lot of problems with merging branches, and most of the time ended up just pulling down the repo fresh and copying my work into it, because that was less work.
Right, and that is a problem for people who are implementing the protocol -- i.e. any implementation other than yours. Who will wish to reimplement these things from source? Well, according to your SDKs page there isn't any C implementation (The C implementation listed actually links to Objective-C, "Matrix.org's reusable UI interfaces for iOS"). So anyone making their own language, or anyone that wants to use a niche language, has to implement that from scratch.
Filesystem defragmentation is fundamentally a hack to cope with a badly designed file system. Literally every other filesystem that isn't NTFS does not require defragmentation. Maybe that's a bad example, though?
Right, but this is an article about replacing chat systems. If someone wants to write their own interface to Matrix, or their own Matrix server, it's pretty clear to see that if it goes anything like the primary interface, it's difficult to implement successfully, and slow even if you somehow manage to make it bug-free.
Right, so what you're saying is that the bug tracker, as of now, does not contain the most serious security problems, and yet somehow based on a look of the non serious bugs, it still manages to look pretty hellish. That doesn't seem like a good thing.
If the project can't be bothered to update the bugtracker, or has bad practice around that, then it's fundamentally hostile to both collaborators and users. I've dealt with bad bugtrackers with a lot of chat applications. It's complete and utter hell and has been the reason outright why I have not adopted certain chat systems.
Conversation history has to be a first class citizen, or you are, on behalf of your users, throwing their data away without their choice. I know many people do not like to keep chat history, but that's not an excuse to throw out chat history full stop. |
@AlexandriaOL - thanks for responding to my points, although I think you might have misunderstood where I was coming from on some of them (probably my fault for not being clearer):
In git, every time more than one person works on a repository, git is effectively merging conflicted branches (their clone of the repo with your clone of the repo) together under the hood. i.e. every time you
Well, empirically people are, just as there are independent implementations of Git. The C/ObjC stuff you mentioned refers to client SDK, which are super simple to write and don't do state res; only server implementations do state res when talking to each other. But a working independent server implementation of state res exists at https://github.com/matrix-construct/construct/ (in C++, if you're focusing on languages).
My point was that modern filesystems don't need manual defragmentation because they effectively defragment themselves in the background as they go along by shuffling stuff around to avoid fragmentation as they go. This is a direct analogy to the situation with matrix-org/synapse#1760 (which started off effectively forcing users to manually prune their extremities whenever performance got bad) and then the fix which landed in matrix-org/synapse#5319, which stops the problem building up by solving it transparently in the background. So I think it's an appropriate example.
Writing 'interfaces to Matrix' (i.e. clients or client SDKs) is trivial - there are literally hundreds of successful ones now. Writing a Matrix server is indeed harder, but the reason the reference implementation took us ages and isn't perfect isn't because "Matrix is hard to implement well" but because we were figuring it out for the first time as we went, complete with missteps, while also optimising for client developers rather than server developers, given clients are where the users are at. But I think the project should be judged on the end result, not the journey.
No, what I'm saying is that the public bug tracker has the least serious security issues on it... which is why there should be no surprise that there are more open than closed, because by definition they're less important(!!) Judging a project by the number of open bugs is like judging a book its number of pages. It doesn't tell you much other than the size of the book.
The bugtracker is pretty good at being kept up-to-date. However, we keep unsolved issues open rather than arbitrary closing them to "make things look better", however minor or deprioritised they are. If you consider that bad practice, then :/
My point was that Matrix is unusual in chat protocols because it treats conversation history as a first class citizen (and decentralises ownership of it over the participating users). This is why it ends up being a decentralised DB, which seems to be root of eta's complaints. I am not proposing throwing out chat history; just the opposite - treating it as the first class citizen it should be, which by extrapolation means you end up with a protocol that looks like Matrix. |
Thoughts on improving chat systems
I’m quite interested in instant messaging technology! I’ve been an avid user of internet relay chat (IRC) –probably the oldest chat protocol in existence – f...
https://theta.eu.org/2019/09/10/chat-systems.html
The text was updated successfully, but these errors were encountered: