Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

/sync returns 404 with "Could not find event" if the last event in a room was rejected #12571

Closed
richvdh opened this issue Apr 27, 2022 · 8 comments · Fixed by #12729
Closed
Assignees
Labels
A-Sync defects related to /sync T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Apr 27, 2022

/sync requests fail with logs like the following:

2022-04-27 19:52:55,224 - synapse.http.server - 95 - INFO - GET-2574276 - <XForwardedForRequest at 0x7fbc161945d8 method='GET' uri='/_matrix/client/r0/sync?filter=139&timeout=0&since=s2893808910_757284974_3577632_1332288051_1339908640_3205332_474955574_3498902419_212414' clientproto='HTTP/1.1' site='21126'> SynapseError: 404 - Could not find event $4KSidiqM831pDAWAg0CikeahLWUySLhqAHadbBKtucI

The event id in question is a rejected event.

Stacktrace for the exception:

  File "/home/synapse/src/synapse/handlers/sync.py", line 407, in current_sync_for_user
    sync_config, since_token, full_state
  File "/home/synapse/src/synapse/handlers/sync.py", line 1114, in generate_sync_result
    sync_result_builder, account_data_by_room
  File "/home/synapse/src/synapse/handlers/sync.py", line 1624, in _generate_sync_entry_for_rooms
    await concurrently_execute(handle_room_entries, room_entries, 10)
  File "/home/synapse/src/synapse/util/async_helpers.py", line 236, in concurrently_execute
    _concurrently_execute_inner, (value for value in itertools.islice(it, limit))
  File "/home/synapse/src/synapse/util/async_helpers.py", line 276, in yieldable_gather_results
    raise dfe.subFailure.value from None
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "/home/synapse/src/synapse/util/async_helpers.py", line 227, in _concurrently_execute_inner
    await maybe_awaitable(func(value))
  File "/home/synapse/src/synapse/handlers/sync.py", line 1619, in handle_room_entries
    always_include=sync_result_builder.full_state,
  File "/home/synapse/src/synapse/handlers/sync.py", line 2127, in _generate_room_entry
    full_state=full_state,
  File "/home/synapse/src/synapse/handlers/sync.py", line 956, in compute_state_delta
    room_id, stream_position=since_token, state_filter=state_filter
  File "/home/synapse/src/synapse/handlers/sync.py", line 669, in get_state_at
    end_token=stream_position.room_key,
  File "/home/synapse/src/synapse/storage/databases/main/stream.py", line 782, in get_last_event_in_room_before_stream_ordering
    event = await self.get_event(event_id, get_prev_content=True)
  File "/home/synapse/src/synapse/storage/databases/main/events_worker.py", line 402, in get_event
    raise NotFoundError("Could not find event %s" % (event_id,))

It looks like get_last_event_in_room_before_stream_ordering is returning that rejected event, and we're then trying to calculate the state at it.

Jaeger link for internal users: https://jaeger.proxy.matrix.org/trace/bc8f105a8857ad29?uiFind=562187d6a0f8c25a

@MadLittleMods MadLittleMods added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Sync defects related to /sync labels Apr 27, 2022
@richvdh
Copy link
Member Author

richvdh commented May 5, 2022

The root problem here appears to be that the last message in the room before the stream_ordering in the since token (2893808910) was rejected:

matrix=> select event_id, stream_ordering, rejection_reason from events where room_id='!yomrOFwgFXzmeMAbzX:matrix.org' and stream_ordering < 2893808910 order by stream_ordering desc limit 2;
                   event_id                   | stream_ordering | rejection_reason 
----------------------------------------------+-----------------+------------------
 $4KSidiqM831pDAWAg0CikeahLWUySLhqAHadbBKtucI |      2893741324 | auth_error
 $95Z6YKWD2yyt7cZIyccEp6kikdFNu1eOSR-IIfOdwLg |      2893739012 | 
(2 rows)

I guess get_last_event_in_room_before_stream_ordering needs to skip over rejected events.

(Also, it shouldn't raise a SynapseError if it hits an error like this - this should be a 500, not a 404.)

@erikjohnston
Copy link
Member

erikjohnston commented May 7, 2022

diff --git a/synapse/storage/databases/main/stream.py b/synapse/storage/databases/main/stream.py
index 793e906630..b1e9c6ede3 100644
--- a/synapse/storage/databases/main/stream.py
+++ b/synapse/storage/databases/main/stream.py
@@ -747,7 +747,7 @@ class StreamWorkerStore(EventsWorkerStore, SQLBaseStore):
                 "SELECT stream_ordering, topological_ordering, event_id"
                 " FROM events"
                 " WHERE room_id = ? AND stream_ordering <= ?"
-                " AND NOT outlier"
+                " AND NOT outlier AND rejection_reason IS NULL"
                 " ORDER BY stream_ordering DESC"
                 " LIMIT 1"
             )

is a patch that fixed this for me. I don't do think the real fix involves joining against rejections since I think events.rejection_reason is new and hasn't been back dated?

@richvdh
Copy link
Member Author

richvdh commented May 11, 2022

I don't think the real fix involves joining against rejections since I think events.rejection_reason is new and hasn't been back dated?

ITYM "I do think"? Correct, rejection_reason needs backfilling. See #11496.

The patch looks about right to me, modulo joining on rejections instead of using events.rejection_reason.

@luilegeant
Copy link

luilegeant commented May 12, 2022

Hello,
I have a similar issue that since today has become problematic, i can't log in with a new client (element web) or clear cache and reload (on element desktop).
All other existing client (devices) are still working (I don't want to risk it by unlogging/clearing cache).

How did I end up there:
After realizing that 11 days of silence in the conduit room (#conduit:fachschaften.org) was a big long, I started looking into it (conduit gitlab issues). I realized it was hit by a spam flood (some part of the story there: https://gitlab.com/famedly/conduit/-/issues/270).
Afterward I tried to clear cache for my element desktop would help catch up? Bad idea, since then element gives me "unable to connect to homeserver" and the element logs give me the same error as mentioned above: 404 with message: could not find event $xxx

Any suggestion I could try by "hand" (admin rest api) to "fix" the situation" ?
Anything I could dig out in my db to help you validate your fix above ?

addendum: I'm running synapse (1.58.1), and it might not be related to conduit room's state. I just taught it might help set the picture of how i encountered the same error message as above.

@jaylik
Copy link

jaylik commented May 12, 2022

We had similar problem with room that has messages automatically purged every 1h with /_synapse/admin/v1/purge_history/ and then all messages that bot has sent after purge were redacted. Sync would fail for new clients that are part of the room.

Issue was resolved by posting something after redacted messages to room.

@erikjohnston
Copy link
Member

I think this broke in #12319

@erikjohnston erikjohnston self-assigned this May 13, 2022
@jaylik
Copy link

jaylik commented May 14, 2022

fed9bba seems to fix the issue, but requires atleast Element IOS clients to manually clear cache to get sync working again.

@luilegeant
Copy link

Hello,
A little update.
After digging a little, I realised that the conduit events weren't related. It was just my personal trigger to start looking into it.
The missing event id was from another server altogether and the event type was: m.room.redaction found via select * from events where event_id = '$...'.
From another device, I've left that room (old dm) and the sync started to work again and logging in from a new device too.

@DMRobertson DMRobertson changed the title /sync returns 404 with "Could not find event" /sync returns 404 with "Could not find event" if the last event in a room was rejected May 16, 2022
@DMRobertson DMRobertson linked a pull request May 16, 2022 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Sync defects related to /sync T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants