Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update spec and explainer to clarify pose and transform semantics #569

Closed
wants to merge 3 commits into from

Conversation

klausw
Copy link
Contributor

@klausw klausw commented Mar 23, 2019

This is intended to address issues #477, #565, and #567 which all involve areas where the spec isn't entirely clear about intended behavior.

@klausw
Copy link
Contributor Author

klausw commented Mar 23, 2019

FWIW, I think originOffset makes it difficult to explain reference spaces things clearly. An XRReferenceSpace basically consists of two XRSpaces that are transformed into each other by the originOffset, and it's hard to explain how that transform works when the two spaces don't have names. I've used the name "effective reference space" for the offset-modified externally visible space and "unmodified reference space" for the internal space corresponding to an identity originOffset, but that's clunky.

@toji toji added the agenda Request discussion in the next telecon/FTF label Mar 25, 2019
@toji toji added this to the Next Working Draft milestone Mar 25, 2019
index.bs Outdated Show resolved Hide resolved
update explainer to match updated spec, add more examples around teleporting

Fix spelling for XRRigidTransform description

inverse()->inverse in explainer

Rebase
@klausw
Copy link
Contributor Author

klausw commented Apr 2, 2019

Rebased to pick up the inverse()=>inverse attribute change from 9ec29f5, and combined into a single commit.

index.bs Outdated
@@ -681,7 +681,7 @@ Spaces {#spaces}
XRSpace {#xrspace-interface}
------------------

An {{XRSpace}} describes an entity that is tracked by the [=/XR device=]'s tracking systems. {{XRSpace}}s MAY NOT have a fixed spatial relationship to one another or to any given {{XRReferenceSpace}}. The transform between two {{XRSpace}} can be evaluated by calling an {{XRFrame}}'s {{XRFrame/getPose()}} method. The interface is intentionally opaque.
An {{XRSpace}} describes an entity that is tracked by the [=/XR device=]'s tracking systems. It conceptually corresponds to a specific origin and orientation, but does not directly expose any numerical coordinate values. {{XRSpace}}s MAY NOT have a fixed spatial relationship to one another or to any given {{XRReferenceSpace}}. The current transform between two {{XRSpace}}s can be evaluated by calling an {{XRFrame}}'s {{XRFrame/getPose()}} method. The interface is intentionally opaque.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"current" is a hard concept to pin down in WebXR. This whole sentence could probably stand to be rephrased to clarify that. Something along the lines of:

"The transform between two {{XRSpace}}s at the time represented by an {{XRFrame}} can be evaluated by calling the frame's {{XRFrame/getPose()}} method."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 30ca9d2 . I've also rephrased the previous sentence to make it explicit that the spatial relationship can change even if there's no real-world movement. New text:

"The spacial relationships between {{XRSpace}}s or {{XRReferenceSpace}} change over time in response to movement of tracked entities, but can also change without real-world movement as a result of ongoing tracking system recalibration. The transform between two {{XRSpace}}s at the time represented by an {{XRFrame}} can be evaluated by calling the frame's's {{XRFrame/getPose()}} method."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason for rephrasing that sentence was that I wanted to avoid the MAY NOT construction - that's not one of the rfc 2119 key terms, and it could be misinterpreted as disallowing fixed spacial relationships as opposed to warning not to assume them.

@Manishearth
Copy link
Contributor

FWIW, I think originOffset makes it difficult to explain reference spaces things clearly

Filed #580

@klausw
Copy link
Contributor Author

klausw commented Apr 3, 2019

If I'm understanding it right, the consensus from the meeting was that we'll be revisiting originOffset as per @Manishearth's #580, but that shouldn't block us from proceeding with the bulk of the changes in this proposal. The originOffset part of it is localized to just one "unstable" paragraph

Reviewers, please let me know if you want me to make further changes, or if it's OK as-is.

@Manishearth
Copy link
Contributor

Yeah, I would still like this pull request to land; it contains changes that are independent of originOffset, and the changes that depend on originOffset are still an unambiguous improvement, which at worst will be overwritten by #580.

Also explicitly mention that spacial relationships can change without
any real-world movement due to ongoing tracking system recalibration.
@toji toji removed agenda Request discussion in the next telecon/FTF labels Apr 10, 2019
1. If |destinationSpace|'s [=XRSpace/session=] does not equal |session|, throw an {{InvalidStateError}} and abort these steps.
1. If |sourceSpace|'s [=XRSpace/session=] does not equal |session|, throw an {{InvalidStateError}} and abort these steps.
1. If |destinationSpace|'s pose cannot be determined relative to |sourceSpace|, return <code>null</code>
1. Return a new {{XRPose}} containing the {{XRRigidTransform}} from |sourceSpace| to |destinationSpace|. The transform's {{XRRigidTransform/position}} is the location of the |sourceSpace| origin in |destinationSpace| coordinates.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone have good naming suggestions here? "sourceSpace" and "destinationSpace" seemed like the best fit for describing the transform, but it's a bit clunky when talking about getPose.

Do any of these variants look better? I've added snippets how the naming would affect this paragraph and the XRRigidTransform definition below.

getPose(XRSpace sourceSpace, XRSpace destinationSpace); =>

Return a new {{XRPose}} containing the {{XRRigidTransform}} from |sourceSpace| to |destinationSpace|. The transform's {{XRRigidTransform/position}} is the location of the |sourceSpace| origin in |destinationSpace| coordinates.

An {{XRRigidTransform}} is a transform from a source {{XRSpace}} to a destination {{XRSpace}} described by ...

getPose(XRSpace space, XRSpace relativeTo); =>

Return a new {{XRPose}} containing the {{XRRigidTransform}} from |space| to |relativeTo|. The transform's {{XRRigidTransform/position}} is the location of the |space| origin in |relativeTo| coordinates.

An {{XRRigidTransform}} is a transform from an {{XRSpace}} to a relativeTo {{XRSpace}} described by ...

getPose(XRSpace targetSpace, XRSpace baseSpace); =>

Return a new {{XRPose}} containing the {{XRRigidTransform}} from |targetSpace| to |baseSpace|. The transform's {{XRRigidTransform/position}} is the location of the |targetSpace| origin in |baseSpace| coordinates.

An {{XRRigidTransform}} is a transform from a target {{XRSpace}} to a base {{XRSpace}} described by ...

getPose(XRSpace transformedSpace, XRSpace baseSpace); =>

Return a new {{XRPose}} containing the {{XRRigidTransform}} from |transformedSpace| to |baseSpace|. The transform's {{XRRigidTransform/position}} is the location of the |transformedSpace| origin in |baseSpace| coordinates.

An {{XRRigidTransform}} is a transform from a transformed {{XRSpace}} to a base {{XRSpace}} described by ... (This one sounds backwards)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NellWaliczek please see ^^^, trying to get the naming sorted out in relation to your request from #589 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to somewhat redundantly clarify getPose with some matrix math so that the confusion doesn't get too bad. I kinda like source destination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the naming options above, I prefer targetSpace/baseSpace. Specifically, I worry that sourceSpace/destinationSpace would present a reversed mental model to the way in which getPose will be increasingly used in practice.

As we start to establish plane anchors, mesh anchors, face anchors, etc., there will be an increasing number of static and dynamic entities in WebXR that are tracked over time by XRSpace instances. To place these entities within their scene each frame, developers will use xrFrame.getPose(interestingEntitySpace, myReferenceSpace) and expect to get back the position and orientation of that entity within their engine's world coordinates (which are underwritten by myReferenceSpace), so they can update that entity's scene object transform in their engine. The mental model for getPose in particular will start to become "what's the position and orientation of this single entity in my reference space", moreso than "get me a model transform I can use to generally transform coordinates from this entity space into my reference space".

However, the above is ultimately just me arguing that we should preference naming that's intuitive for the position/orientation attributes on XRRigidTransform rather than the matrix attribute. In the end, @klausw had a key observation in #580 (comment) that applies here:

I'm starting to think that a core part of the confusion is that a tracked object's XRPose has-a XRRigidTransform, and the transform's position component corresponds to the tracked object's coordinates in reference space as expected, but a XRRigidTransform and its matrix are a transform from object space to reference space. That's internally consistent, but it's confusing when you supply a transform by itself as an argument (as in the constructor arg here) because it's easily interpreted as saying "please apply this transform" which would be backwards. In a way, OpenXR's approach of directly putting position + orientation on the pose objects helps avoid this ambiguity.

Ultimately, any naming pattern and/or conceptual framework we come up with that makes position/orientation make more sense will make matrix make less sense, and vice versa.

In OpenXR, we've managed to sidestep entire classes of confusion around the order or direction in which matrices are applied, whether they're pre-multiplied or post-multiplied, whether they are row-major/column-major, etc. by banishing matrices from the API altogether. Rather than expressing transform matrices "from" or "to" various spaces, all poses are expressed only as a position vector and orientation quaternion "in" a given space.

Even for view matrices, as discussed in #447, we found in OpenXR that any non-trivial engine ends up positioning a camera object within its scene in world coordinates anyway, and so even there it ends up being more convenient for developers to reason about the position and orientation of their view rather than getting a view matrix directly. You can see this pattern in OpenXR in xrLocateViews, which returns a set of XrView structs, each containing the position and orientation of a given view.

Specifically in the case of OpenXR reference spaces with a non-identity pose applied:

  • Each reference space type is specified in terms of how its "natural" origin behaves.
  • If you apply a non-identity pose when creating a reference space, that's specified as placing the origin of your reference space at that position and orientation "in" the "natural" reference space.

By defining the origin of any space as either a well-specified location in the physical world or a position and orientation within some other such space, all space origins have a clear binding to a place in the physical world. It then falls out without ambiguity how you "get" the "pose" of any one space "in" another.

If we ourselves continue to struggle with these naming and direction issues each week, despite being embedded in the details of the spec, perhaps it's the lesser evil to just remove matrix from XRRigidTransform? This will forcibly snap the developer's mental model to an unambiguous position and orientation approach throughout the API, and then we can design our naming to be consistent with that model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here isn't matrix, the problem is the use of the word transform itself.

I point this out here: https://github.com/immersive-web/editor-collab/pull/36#discussion_r278361579 , there are two ways one can look at a transform. Just talking about position/orientation as a "pose" is clear, however when you talk of the "transform" of a position/orientation that can mean opposite things.

If we can carefully clarify our use of the word "transform", talking about matrices isn't hard.

I feel like we should definitely try and keep matrix around for developer convenience.

@klausw
Copy link
Contributor Author

klausw commented Apr 26, 2019

Retiring this PR, please see the replacement #609 "Overhaul XRSpace, get(Viewer)Pose definitions". We can check separately to see if there's pieces in here we'd like to keep, i.e. for the explainer, but that would be a separate new PR if applicable.

@klausw klausw closed this Apr 26, 2019
@toji
Copy link
Member

toji commented Apr 26, 2019

The primary thing that my PR doesn't capture, upon re-reading this, is the explainer changes. While it would be good to ensure that they're consistent, I'm inclined to view updating the explainer on topics like this as less critical, simply because the spec is authoritative and should be rigorous while the explainers are intended to be higher level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants