Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Spec compliance: require a request body on POST /rooms/{roomId}/receipt/{receiptType}/{eventId} #10534

Closed
3 of 4 tasks
reivilibre opened this issue Aug 4, 2021 · 21 comments · Fixed by #12709
Closed
3 of 4 tasks
Assignees
Labels
A-Spec-Compliance places where synapse does not conform to the spec T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@reivilibre
Copy link
Contributor

reivilibre commented Aug 4, 2021

We currently don't require a body on POST /rooms/{roomId}/receipt/{receiptType}/{eventId}.

Initially, this seems to have been a mistake, but since #10531 [v1.40.0rc2], it's a workaround because Element Android relied on that behaviour (which we broke by actually using the body for hidden read receipts in #10413 [v1.40.0rc1]).

Blockers:

Eventually, we should return to enforcing the spec.

@reivilibre reivilibre added the Z-Future-Maintenance Things that can't yet be done, but will need cleaning up in a couple of months/releases label Aug 4, 2021
@clokep clokep added the A-Spec-Compliance places where synapse does not conform to the spec label Aug 4, 2021
@richvdh
Copy link
Member

richvdh commented Aug 5, 2021

I'd like us to set a timescale for deprecation of this workaround while it's still fresh in our heads.

@callahad callahad self-assigned this Aug 12, 2021
@erikjohnston erikjohnston added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Aug 23, 2021
@callahad
Copy link
Contributor

callahad commented Aug 26, 2021

Fixed in Element Android this week, will be in EA 1.2.1. Have asked for an estimate of user uptake so we can figure out a timeline and will report back once I know.

I've also mentioned this anticipated change in the Matrix Client Developers room.

@callahad
Copy link
Contributor

General feeling is "the longer the better" for Android uptake, but at least 4-6 weeks. We should watch traffic on matrix.org to confirm.

@richvdh
Copy link
Member

richvdh commented Aug 26, 2021

Thanks Dan. Let's revisit this at the end of October in the hope that we will feel comfortable fixing it then?

@reivilibre
Copy link
Contributor Author

reivilibre commented Aug 26, 2021

"the longer the better" for Android uptake

yeah, especially considering F-Droid (the builds lag behind a bit and generally require manual updating).
I'm certainly guilty of not updating for weeks (though it's also likely that F-Droid users will tend to know how to fend for themselves a bit, and realise that updating might fix bugs)...

@callahad callahad changed the title Spec compliance: require a request body on POST /rooms/{roomId}/receipt/{receiptType}/{eventId} [2021-10-04: Review] Spec compliance: require a request body on POST /rooms/{roomId}/receipt/{receiptType}/{eventId} Sep 2, 2021
@callahad callahad added this to the Revisit: Monthly milestone Sep 15, 2021
@callahad callahad changed the title [2021-10-04: Review] Spec compliance: require a request body on POST /rooms/{roomId}/receipt/{receiptType}/{eventId} Spec compliance: require a request body on POST /rooms/{roomId}/receipt/{receiptType}/{eventId} Oct 7, 2021
@callahad
Copy link
Contributor

callahad commented Oct 7, 2021

Let's check matrix.org stats for EA < 1.2.1, but otherwise target for the tail end of the Synapse 1.46 window (thus merging around 21st Oct to give us a few days before RC1)

@callahad callahad removed this from the Revisit: Next Week milestone Oct 14, 2021
@callahad
Copy link
Contributor

Checking stats this week

@callahad
Copy link
Contributor

Did a little bit of sleuthing and for users on matrix.org last Wednesday (2021-10-13), who had Android in their user agent string, roughly:

  • 85.5% were on known good versions of clients
  • 14.4% were on known bad versions of clients
  • 0.1% were on unknown clients

I don't think we can unilaterally break that fraction of our Android users, but the known clients are known, so we could constrain the hack to only allow empty bodies from those specific clients. This will at least prevent the problem from getting worse.

@callahad
Copy link
Contributor

Specifically, we could limit the hack to only apply when the user agent starts with:

  • Anything containing Riot (e.g., Old Riot.im, Riot, RiotX, Element (Riot.im))
  • Anything starting with Element/1.[012].*
  • Anything starting with SchildiChat/1.[012].*

@callahad callahad removed their assignment Oct 15, 2021
@callahad
Copy link
Contributor

Question: Do we constrain the workaround per above, or do we wait even longer? The rate of decay is slow enough that I don't think we'll dip under 10% for many months.

@reivilibre
Copy link
Contributor Author

I'd be pretty reluctant to add user agent sniffing in to the workaround; it feels nasty to make the homeserver have to discriminate on user agent (and it's not particularly fair for e.g. forks of Element that we're not aware of, which may not be sending the same user agent).

However, 15% of Android users is a substantial number of users to cut off (were you looking at the active sessions, daily visits or all the devices, ooi?). I would be unhappy doing so because it will just be one more thing that makes Matrix as a whole look buggy.

@callahad
Copy link
Contributor

(were you looking at the active sessions, daily visits or all the devices, ooi?)

I was specifically looking at the results of the following query:

SELECT COUNT(*), user_agent
FROM user_daily_visits
WHERE timestamp >= extract(epoch from timestamp '2021-10-13')::bigint * 1000
  AND timestamp < extract(epoch from timestamp '2021-10-14')::bigint * 1000 
  AND user_agent ILIKE '%android%'
  AND user_agent NOT ILIKE 'Mozilla%'
GROUP BY user_agent
ORDER BY user_agent

@callahad
Copy link
Contributor

callahad commented Oct 18, 2021

it feels nasty to make the homeserver have to discriminate on user agent

Agreed that a UA-specific hack is... a hack... but at least it stops new callers from accidentally adopting the incorrect behavior, so it at least ensures that the ecosystem does not regress.

it's not particularly fair

I think it's fair enough. To the best of our knowledge (which includes actually breaking people in production), only Element Android and its forks have ever been subject to this particular misimplementation of the spec.

From looking at all clients that advertised themselves as "android" on October 13th, limiting the hack to just old versions of Riot, Element Android, and SchildiChat would get us to a point where literally more than 99.9% of all Android clients we saw on matrix.org would explicitly be known to work.

Basically, I'm willing to risk that long tail of fewer than 0.1% of android clients which we've not explicitly verified whether they comply with the spec or not.

@callahad
Copy link
Contributor

We'll implement the UA hack as a stopgap.

@callahad
Copy link
Contributor

Tracking the UA hack in #11156, punting this into the "look at it next year" bucket.

@callahad callahad added this to the Revisit: Next Year milestone Oct 21, 2021
@callahad
Copy link
Contributor

callahad commented Jan 6, 2022

Next step is to rip out the hack once we're comfortable with ecosystem adoption...

Dan to pull numbers again.

@callahad callahad self-assigned this Jan 6, 2022
@callahad callahad removed this from the Revisit: Next Year milestone Jan 6, 2022
@callahad
Copy link
Contributor

callahad commented Jan 6, 2022

(Waiting until next week Wednesday just so we get a more representative sample of people coming back from holidays)

@clokep
Copy link
Member

clokep commented Mar 17, 2022

FWIW #12168 is an update to the hidden read receipt logic which means we don't actually need a body there anymore. We could remove all the code that cares about the body or still move forward with this?

@reivilibre
Copy link
Contributor Author

It's not a big deal but I think we should still fix this; it's only fair to other homeserver implementations that we should enforce the spec a bit when we become aware of a deficiency.

@erikjohnston
Copy link
Member

Android team are happy for us to drop the workaround patch now!

@erikjohnston erikjohnston removed the Z-Future-Maintenance Things that can't yet be done, but will need cleaning up in a couple of months/releases label May 6, 2022
@DMRobertson
Copy link
Contributor

Android team are happy for us to drop the workaround patch now!

In that case, next step is to strip out the user-agent specific logic added in #11157, and write a removal notice in the changelog.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Spec-Compliance places where synapse does not conform to the spec T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants