Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix leaking room state objects on limited sync responses #395

Merged
merged 12 commits into from
Mar 22, 2017

Conversation

kegsay
Copy link
Member

@kegsay kegsay commented Mar 20, 2017

This PR does 2 things to save memory:

  • It forces strings to be shared between MatrixEvent objects. By default they are NOT shared in events because we do JSON.parse(someString), so we do a no-op assignment into a global interns map to force V8 to share strings. Strings are immutable so this has no practical effect other than a bit of space saving. We never remove events either, so the strings which are there should be net positive. On my account, this reduces the amount of memory used from 350MB to 320MB.

  • It force clears the non-live timelines on a limited sync response. I was pleased to see there was already a flag for this: flush. Tested it by spamming #matrix-test whilst offline and:

    • Viewing the room (and making sure it jumps down on reconnect)
    • Not viewing the room

    Can confirm via heap snapshots it is now reaping correctly. This should fix Riot-web is very greedy on RAM (and can make chrome crash) element-hq/element-web#3307 (comment)

event[prop] = interns[event[prop]];
});

["membership", "avatar_url", "displayname"].forEach((prop) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should worry about interns filling up with old displaynames and avatar_urls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never remove events either, so the strings which are there should be net positive.

If we deleted m.room.member events then absolutely, we probably would need to start caring. We still keep all previous names/avatar URLs when the user changes their name/avatar, so frequent name changes will still be referenced somewhere in the code.

@@ -207,7 +207,12 @@ Room.prototype.getLiveTimeline = function() {
*/
Room.prototype.resetLiveTimeline = function(backPaginationToken) {
for (let i = 0; i < this._timelineSets.length; i++) {
this._timelineSets[i].resetLiveTimeline(backPaginationToken);
// Flush out non-live timelines. We do this to reduce the amount of memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implies we're only flushing out the non-live timelines; I don't think that is correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, right you are

// can be 10s of MBs) for each timeline is expensive. This is particularly
// noticeable when the client unsleeps and there are a lot of 'limited: true'
// responses. https://github.com/vector-im/riot-web/issues/3307#issuecomment-282895568
this._timelineSets[i].resetLiveTimeline(backPaginationToken, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're going to need to convince me that you've properly thought through the consequences of clearing _eventIdToTimeline while the user is still potentially viewing events in an old timeline. This feels like it's going to break as much as it fixes right now :/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it by spamming #matrix-test whilst offline and viewing the room (and making sure it jumps down on reconnect)

If you want more convincing then state what you would like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't jump down if you're viewing historical events.

A handful of examples: what happens when you scroll down so that the historical timeline merges with the new one? what happens if you then scroll back up? what happens if you jump to a permalink to the events that are now in a timeline which is attached to the live timeline, but are no longer in _eventIdToTimeline ?

Mostly though, I think that a change that deliberately leaves the datastructures in an inconsistent state is a bad idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a change that deliberately leaves the datastructures in an inconsistent state is a bad idea.

I'm not, I'm removing all the events in the timeline then adding the ones I got from a limited sync. Where is the inconsistency? The event ID to timeline map is cleared when we flush to preserve consistency: https://github.com/matrix-org/matrix-js-sdk/blob/master/src/models/event-timeline-set.js#L163

it shouldn't jump down if you're viewing historical events.

Why? This strictly happens on a limited /sync only, which restricts this to startup/load/unsleep. If we don't jump down, then the events needs to hang around until some indeterminate point in time when they can be reaped, and it's unclear when that should happen or how that should work, and sounds exceptionally prone to leaking the events beyond the point where you've removed them from the JS SDK object model. Your examples focus on the assumption that the events are still around, and the entire point of this PR is to make them NOT hang around.

If they don't hang around, then it's really quite simple: we drop all events from the room and only populate the most recent 20 messages. The things to check are:

  • Make sure that the other events are GCed.
  • Make sure that if you were viewing some of the older events, that the screen jumps to the bottom as they no longer exist.

Both of those cases I have manually tested and confirmed. I think we may have philosophical differences on how this should behave, so may need @ara4n's opinion on how to proceed.

@richvdh richvdh assigned kegsay and unassigned richvdh Mar 21, 2017
@richvdh
Copy link
Member

richvdh commented Mar 21, 2017

I'm not, I'm removing all the events in the timeline then adding the ones I got from a limited sync. Where is the inconsistency?

If the user is viewing historical events, and then forward paginates to the live timeline, the historical events will get joined onto the live timeline, hence: "what happens if you jump to a permalink to the events that are now in a timeline which is attached to the live timeline, but are no longer in _eventIdToTimeline ?"

it shouldn't jump down if you're viewing historical events.

Why? This strictly happens on a limited /sync only, which restricts this to startup/load/unsleep

... or any period where more than 20 messages arrive between /sync calls. If I've carefully scrolled back to a bit of history, and a flurry of activity combined with a network glitch means that I get a limited sync, I am going to be cursing your name if my view suddenly jumps to the bottom.

@richvdh
Copy link
Member

richvdh commented Mar 21, 2017

Anyway: if the application layer is obliged to make sure it has dropped all references to historical timelines on a limited sync (which I still think is a bad idea for the UX), you need to plaster that big and bold across the APIs, rather than discovering that it appears to be ok in Riot and calling it job done.

@kegsay
Copy link
Member Author

kegsay commented Mar 21, 2017

If the user is viewing historical events, and then forward paginates to the live timeline

The user cannot forward paginate in this scenario, because the timelines were dropped. It causes the timeline to fire http://matrix-org.github.io/matrix-js-sdk/0.7.5/module-client.html#~event:MatrixClient%2522Room.timelineReset%2522 - It's wholly unclear what we expect clients to do when this event fires. The docs don't say.

what happens if you jump to a permalink to the events that are now in a timeline which is attached to the live timeline, but are no longer in _eventIdToTimeline ?

I cannot test this because the timelines are dropped on a limited sync, which is what I've repeatedly been saying. Breaking down your question:

  • what happens if you jump to a permalink : I can do this, but there's nothing special about it because none of those events are in any timeline.
  • events that are now in a timeline which is attached to the live timeline : If I scroll down from the permalink, I will eventually reach the live timeline, which I guess counts as "attached", so yes I can do that too.
  • but are no longer in _eventIdToTimeline : This bit I can't do. They will be in _eventIdToTimeline.

How do you want me to proceed?

@richvdh
Copy link
Member

richvdh commented Mar 21, 2017

I'm not quite sure where the disconnect is happening here - we are obviously still somehow at cross-purposes. So let me try to be absolutely clear.

I fired up a riot build with your branch of the js-sdk, and did the following:

  • scroll back a little way in the room. Grab a permalink while you are there.
  • go offline
  • spam a bunch of messages into the room to trigger a limited sync
  • go back online (wait for the sync to notice)
  • scroll back down.
  • You now have events in the timeline which are not in _eventIdToTimeline. (I checked.)
  • Navigate to the permalink (I did this just by sticking the event ID in the URL bar)
  • The console logs now say:
event-timeline-set.js:400 Already have timeline for $1490131020162teLDt:fred.sw1v.org - joining timeline !RoiHwUcCARaykMwyRo:fred.sw1v.org:2017-03-21T21:26:38.955Z to !RoiHwUcCARaykMwyRo:fred.sw1v.org:2017-03-21T21:24:04.933Z
event-timeline.js:217 Uncaught Error: timeline already has a neighbouring timeline - cannot reset neighbour
    at EventTimeline.setNeighbouringTimeline (event-timeline.js:217)
    at EventTimelineSet.addEventsToTimeline (event-timeline-set.js:404)
    at client.js:1963
    at _fulfilled (q.js:834)
    at self.promiseDispatch.done (q.js:863)
    at Promise.promise.promiseDispatch (q.js:796)
    at q.js:604
    at runSingle (q.js:137)
    at flush (q.js:125)
    at onNextTick (main.js:64)

This process used to work fine. TBH, it still does, modulo the scary error in the console, but I think that errors like that in the console logs are unacceptable, and in this case symptomatic of other problems.

Let's discuss what to do about it when we meet in person.

@kegsay
Copy link
Member Author

kegsay commented Mar 22, 2017

scroll back down.

This is the bit where we are crossing purposes. I can't scroll down, since the instant it syncs the timeline flashes and jumps to the bottom. Other than that, your steps sound the same as mine.

@richvdh
Copy link
Member

richvdh commented Mar 22, 2017 via email

@kegsay
Copy link
Member Author

kegsay commented Mar 22, 2017

Right, I think I know why. We do different things depending on the scroll position when the timeline resyncs. If it's at the bottom things work correctly and the timeline reloads, else it ignores the reset request. It sounds like we've been hitting one or the other case which is why we don't agree on the results.

https://github.com/matrix-org/matrix-react-sdk/blob/master/src/components/structures/TimelinePanel.js#L429

@richvdh
Copy link
Member

richvdh commented Mar 22, 2017

Yes, that's what I've been trying to tell you...

@kegsay
Copy link
Member Author

kegsay commented Mar 22, 2017

Conclusions from IRL:

  • The PR in its current state causes inconsistencies in the timeline because the app may have references to older parts of the timeline which will now get removed.
  • Due to this, it is ultimately not possible for the SDK to dictate when it is safe to reap events from a timeline: the SDK needs to ask the app if it is safe to reset the entire timeline for a room.
  • Proposal is to add a function to the opts of MatrixClient: canResetEntireTimeline(roomId) => bool which the SDK will call when it wants to reset the timeline (on a limited sync).
  • By default, this function always returns false which preserves the original behaviour.
  • The result of this function is directly passed as the flush flag: this means if you can reset the entire timeline for a room, then flush all non-live timelines.
  • An extra arg will be added to the Room.timelineReset event which fires, which indicates whether the app said it was safe to reset the entire timeline.

React SDK will implement this function and will always return true UNLESS:

  • The user is currently viewing the room AND
  • The timeline panel is not at the bottom.

@richvdh Does this sound accurate?

@richvdh
Copy link
Member

richvdh commented Mar 22, 2017

ja think so

Anything you can do to improve crappy jsdocs while you are in there would be very much appreciated

@kegsay
Copy link
Member Author

kegsay commented Mar 22, 2017

@richvdh PTAL

@kegsay kegsay assigned richvdh and unassigned kegsay Mar 22, 2017
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good except for grumbling about names and docs

@@ -187,7 +187,7 @@ EventTimelineSet.prototype.resetLiveTimeline = function(backPaginationToken, flu
newTimeline.setPaginationToken(backPaginationToken, EventTimeline.BACKWARDS);

this._liveTimeline = newTimeline;
this.emit("Room.timelineReset", this.room, this);
this.emit("Room.timelineReset", this.room, this, flush);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance we can improve the name of flush (resetAll seems fine) and clarify what it means to flush the non-live timelines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, should be flush || !this._timelineSupport. Or better yet, factor out the condition so that they don't get out of sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 14727d7

@@ -655,4 +655,6 @@ module.exports = EventTimelineSet;
* @event module:client~MatrixClient#"Room.timelineReset"
* @param {Room} room The room whose live timeline was reset, if any
* @param {EventTimelineSet} timelineSet timelineSet room whose live timeline was reset
* @param {boolean} resetAll The return value for canResetEntireTimeline() for this room.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a bit of a backwards description to me. At this point, as an app, I am interested in what the js-sdk did (ie, reset all the timelines - again, needs a better description), rather than why it did it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 14727d7

@kegsay kegsay assigned richvdh and unassigned kegsay Mar 22, 2017
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Riot-web is very greedy on RAM (and can make chrome crash)
2 participants