Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove stream_positions table #17047

Closed

Conversation

realtyem
Copy link
Contributor

@realtyem realtyem commented Apr 4, 2024

The stream_positions table is used to track what sequence number(as a stream id) has been last persisted into the database, and only for Postgres. Updating this table seems to be done often, but is only read from during startup. As all the data that is in that table can be discovered in each stream's actual tables, just use that at startup and don't bother with this extra table.

After start-up, this data is maintained in memory the same way it is today.

The motivator for this change:

2024-02-22 16:24:01.454 CST [127] LOG:  duration: 1836.864 ms  plan:
        Query Text:
                    INSERT INTO stream_positions (stream_name, instance_name, stream_id)
                    VALUES ('presence_stream', 'presence1', (87199721))
                    ON CONFLICT (stream_name, instance_name)
                    DO UPDATE SET
                        stream_id = EXCLUDED.stream_id
                    WHERE stream_positions.stream_id < EXCLUDED.stream_id

        Insert on stream_positions  (cost=0.00..0.01 rows=0 width=0) (actual time=1836.861..1836.862 rows=0 loops=1)
          Conflict Resolution: UPDATE
          Conflict Arbiter Indexes: stream_positions_idx
          Conflict Filter: (stream_positions.stream_id < excluded.stream_id)
          Tuples Inserted: 0
          Conflicting Tuples: 1
          ->  Result  (cost=0.00..0.01 rows=1 width=72) (actual time=0.002..0.003 rows=1 loops=1)
2024-02-22 16:24:01.538 CST [148] LOG:  process 148 still waiting for ExclusiveLock on tuple (0,110) of relation 20495 of database 19796 after 1000.080 ms
2024-02-22 16:24:01.538 CST [148] DETAIL:  Process holding the lock: 149. Wait queue: 151, 97, 150, 148, 124.
2024-02-22 16:24:01.538 CST [148] STATEMENT:
                    INSERT INTO stream_positions (stream_name, instance_name, stream_id)
                    VALUES ('presence_stream', 'presence1', (87199723))
                    ON CONFLICT (stream_name, instance_name)
                    DO UPDATE SET
                        stream_id = EXCLUDED.stream_id
                    WHERE stream_positions.stream_id < EXCLUDED.stream_id

2024-02-22 16:24:01.883 CST [127] LOG:  duration: 2265.752 ms  statement:
                    INSERT INTO stream_positions (stream_name, instance_name, stream_id)
                    VALUES ('presence_stream', 'presence1', (87199721))
                    ON CONFLICT (stream_name, instance_name)
                    DO UPDATE SET
                        stream_id = EXCLUDED.stream_id
                    WHERE stream_positions.stream_id < EXCLUDED.stream_id

Let's just...not do this 🤷‍♂️

Pull Request Checklist

…t's near the beginning of the init process. Just set it to the 1 as the default.
…th new SQL looking up the data directly in the associated tables for a given stream writer
…xisting_stream()

Without a `stream_positions` table to lean on, this only needed a few small tweaks to
wording and actual representative positions
@realtyem realtyem marked this pull request as ready for review April 4, 2024 09:56
@realtyem realtyem requested a review from a team as a code owner April 4, 2024 09:56
@@ -451,7 +443,8 @@ def __init__(
self._current_positions.values(), default=1
)

# For the case where `stream_positions` is not up to date,
# TODO: this needs updating or verifying
Copy link
Contributor Author

@realtyem realtyem Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: TODO comment
Actually, I believe this block can be removed. _max_seen_allocated_stream_id is already set just above and will already equal what _persisted_upto_position is because of what happens inside of _load_current_ids().

Either it already is set to the max()(see new line 519)
or it is already set to the max() through iteration of _add_persisted_positions()(see new line 561)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I see my ability to show the spot correctly has not improved with time)

# to be persisted or will default to 1.
# TODO: is this next comment still valid?
# (This can be a problem for e.g. backfill streams where the server has
# never backfilled).
Copy link
Contributor Author

@realtyem realtyem Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: TODO comment
I believe this is resolved by the setting of max_stream_id to 1 at the top of the function. If I'm following the juggling of negatives and positives, this one is always positive?

# If we add back the old "first" then we shouldn't see the persisted up
# to position revert back to 3.
# If we add back the old "first" then we should see the persisted up
# to position revert back to 3, as this writer hasn't written anything since.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I postulate that it actually should be a 3 and not a 6, as "first" hasn't actually written anything since it's initial 3 reflecting it's minimum persisted value.

I assert that the first instance created in this test is proof, since it's
instance name is neither of the writers named and is therefore a 'reader' and not a 'writer', just as this 5th instance is.

@erikjohnston
Copy link
Member

This table got added in matrix-org/synapse#8374, does the rationale no longer make sense?

@realtyem
Copy link
Contributor Author

realtyem commented Apr 4, 2024

This table got added in matrix-org/synapse#8374, does the rationale no longer make sense?

Thank you for pointing at this, I was not aware of the 'why' behind it. I will have to process what all of that means for a while, I think. Your question is worth a good response.

It does appear that the majority of the tables do not have indexes on instance_name, so if my Postgres is up to par means that a table scan would end up being required.

However, I believe eliminating the excess of dead tuples in Postgres(not to mention the wear on hard drives) in place of a table scan on startup may be worth the effort. I don't suppose you recall how long it was taking for the table scan, as whatever evidence my server has would not be particularly useful next to a larger instance such as matrix.org?(Just as a curiosity)

@erikjohnston
Copy link
Member

erikjohnston commented Apr 4, 2024

I agree that it would be good to find a way of reducing the DB load due to it, just not quite sure how atm 🙂

@erikjohnston erikjohnston marked this pull request as draft April 4, 2024 12:26
@erikjohnston erikjohnston removed the request for review from a team April 4, 2024 12:26
@realtyem
Copy link
Contributor Author

realtyem commented Apr 5, 2024

I think I have a different angle to try on this, so I will close it

@realtyem realtyem closed this Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants