Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update spec and explainer to clarify pose and transform semantics #569

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -641,7 +641,7 @@ An {{XRFrame}} represents a snapshot of the state of all of the tracked objects
readonly attribute XRSession session;

XRViewerPose? getViewerPose(XRReferenceSpace referenceSpace);
XRPose? getPose(XRSpace space, XRSpace relativeTo);
XRPose? getPose(XRSpace sourceSpace, XRSpace destinationSpace);
};
</pre>

Expand All @@ -658,20 +658,20 @@ When the <dfn method for="XRFrame">getViewerPose(|referenceSpace|)</dfn> method
1. Let |session| be the {{XRFrame}}'s {{XRFrame/session}} object.
1. If |referenceSpace|'s [=XRSpace/session=] does not equal |session|, throw an {{InvalidStateError}} and abort these steps.
1. If the [=viewer=]'s pose cannot be determined relative to |referenceSpace|, return <code>null</code>
1. Return a new {{XRViewerPose}} describing the [=viewer=]'s pose relative to the origin of |referenceSpace| at the timestamp of the {{XRFrame}}.
1. Return a new {{XRViewerPose}} with an {{XRRigidTransform}} from the [=viewer=] space to |referenceSpace| at the timestamp of the {{XRFrame}}. The transform's {{XRRigidTransform/position}} is the location of the [=viewer=] in reference space coordinates.

</div>

<div class="algorithm unstable" data-algorithm="get-pose">

When the <dfn method for="XRFrame">getPose(|space|, |relativeTo|)</dfn> method is invoked, the user agent MUST run the following steps:
When the <dfn method for="XRFrame">getPose(|sourceSpace|, |destinationSpace|)</dfn> method is invoked, the user agent MUST run the following steps:

1. If the {{XRFrame}}'s [=active=] boolean is <code>false</code>, throw a {{InvalidStateError}} and abort these steps.
1. Let |session| be the {{XRFrame}}'s {{XRFrame/session}} object.
1. If |space|'s [=XRSpace/session=] does not equal |session|, throw an {{InvalidStateError}} and abort these steps.
1. If |relativeTo|'s [=XRSpace/session=] does not equal |session|, throw an {{InvalidStateError}} and abort these steps.
1. If |space|'s pose cannot be determined relative to |relativeTo|, return <code>null</code>
1. Return a new {{XRPose}} describing |space|'s pose relative to the origin of |relativeTo|.
1. If |destinationSpace|'s [=XRSpace/session=] does not equal |session|, throw an {{InvalidStateError}} and abort these steps.
1. If |sourceSpace|'s [=XRSpace/session=] does not equal |session|, throw an {{InvalidStateError}} and abort these steps.
1. If |destinationSpace|'s pose cannot be determined relative to |sourceSpace|, return <code>null</code>
1. Return a new {{XRPose}} containing the {{XRRigidTransform}} from |sourceSpace| to |destinationSpace|. The transform's {{XRRigidTransform/position}} is the location of the |sourceSpace| origin in |destinationSpace| coordinates.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone have good naming suggestions here? "sourceSpace" and "destinationSpace" seemed like the best fit for describing the transform, but it's a bit clunky when talking about getPose.

Do any of these variants look better? I've added snippets how the naming would affect this paragraph and the XRRigidTransform definition below.

getPose(XRSpace sourceSpace, XRSpace destinationSpace); =>

Return a new {{XRPose}} containing the {{XRRigidTransform}} from |sourceSpace| to |destinationSpace|. The transform's {{XRRigidTransform/position}} is the location of the |sourceSpace| origin in |destinationSpace| coordinates.

An {{XRRigidTransform}} is a transform from a source {{XRSpace}} to a destination {{XRSpace}} described by ...

getPose(XRSpace space, XRSpace relativeTo); =>

Return a new {{XRPose}} containing the {{XRRigidTransform}} from |space| to |relativeTo|. The transform's {{XRRigidTransform/position}} is the location of the |space| origin in |relativeTo| coordinates.

An {{XRRigidTransform}} is a transform from an {{XRSpace}} to a relativeTo {{XRSpace}} described by ...

getPose(XRSpace targetSpace, XRSpace baseSpace); =>

Return a new {{XRPose}} containing the {{XRRigidTransform}} from |targetSpace| to |baseSpace|. The transform's {{XRRigidTransform/position}} is the location of the |targetSpace| origin in |baseSpace| coordinates.

An {{XRRigidTransform}} is a transform from a target {{XRSpace}} to a base {{XRSpace}} described by ...

getPose(XRSpace transformedSpace, XRSpace baseSpace); =>

Return a new {{XRPose}} containing the {{XRRigidTransform}} from |transformedSpace| to |baseSpace|. The transform's {{XRRigidTransform/position}} is the location of the |transformedSpace| origin in |baseSpace| coordinates.

An {{XRRigidTransform}} is a transform from a transformed {{XRSpace}} to a base {{XRSpace}} described by ... (This one sounds backwards)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NellWaliczek please see ^^^, trying to get the naming sorted out in relation to your request from #589 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to somewhat redundantly clarify getPose with some matrix math so that the confusion doesn't get too bad. I kinda like source destination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the naming options above, I prefer targetSpace/baseSpace. Specifically, I worry that sourceSpace/destinationSpace would present a reversed mental model to the way in which getPose will be increasingly used in practice.

As we start to establish plane anchors, mesh anchors, face anchors, etc., there will be an increasing number of static and dynamic entities in WebXR that are tracked over time by XRSpace instances. To place these entities within their scene each frame, developers will use xrFrame.getPose(interestingEntitySpace, myReferenceSpace) and expect to get back the position and orientation of that entity within their engine's world coordinates (which are underwritten by myReferenceSpace), so they can update that entity's scene object transform in their engine. The mental model for getPose in particular will start to become "what's the position and orientation of this single entity in my reference space", moreso than "get me a model transform I can use to generally transform coordinates from this entity space into my reference space".

However, the above is ultimately just me arguing that we should preference naming that's intuitive for the position/orientation attributes on XRRigidTransform rather than the matrix attribute. In the end, @klausw had a key observation in #580 (comment) that applies here:

I'm starting to think that a core part of the confusion is that a tracked object's XRPose has-a XRRigidTransform, and the transform's position component corresponds to the tracked object's coordinates in reference space as expected, but a XRRigidTransform and its matrix are a transform from object space to reference space. That's internally consistent, but it's confusing when you supply a transform by itself as an argument (as in the constructor arg here) because it's easily interpreted as saying "please apply this transform" which would be backwards. In a way, OpenXR's approach of directly putting position + orientation on the pose objects helps avoid this ambiguity.

Ultimately, any naming pattern and/or conceptual framework we come up with that makes position/orientation make more sense will make matrix make less sense, and vice versa.

In OpenXR, we've managed to sidestep entire classes of confusion around the order or direction in which matrices are applied, whether they're pre-multiplied or post-multiplied, whether they are row-major/column-major, etc. by banishing matrices from the API altogether. Rather than expressing transform matrices "from" or "to" various spaces, all poses are expressed only as a position vector and orientation quaternion "in" a given space.

Even for view matrices, as discussed in #447, we found in OpenXR that any non-trivial engine ends up positioning a camera object within its scene in world coordinates anyway, and so even there it ends up being more convenient for developers to reason about the position and orientation of their view rather than getting a view matrix directly. You can see this pattern in OpenXR in xrLocateViews, which returns a set of XrView structs, each containing the position and orientation of a given view.

Specifically in the case of OpenXR reference spaces with a non-identity pose applied:

  • Each reference space type is specified in terms of how its "natural" origin behaves.
  • If you apply a non-identity pose when creating a reference space, that's specified as placing the origin of your reference space at that position and orientation "in" the "natural" reference space.

By defining the origin of any space as either a well-specified location in the physical world or a position and orientation within some other such space, all space origins have a clear binding to a place in the physical world. It then falls out without ambiguity how you "get" the "pose" of any one space "in" another.

If we ourselves continue to struggle with these naming and direction issues each week, despite being embedded in the details of the spec, perhaps it's the lesser evil to just remove matrix from XRRigidTransform? This will forcibly snap the developer's mental model to an unambiguous position and orientation approach throughout the API, and then we can design our naming to be consistent with that model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here isn't matrix, the problem is the use of the word transform itself.

I point this out here: https://github.com/immersive-web/editor-collab/pull/36#discussion_r278361579 , there are two ways one can look at a transform. Just talking about position/orientation as a "pose" is clear, however when you talk of the "transform" of a position/orientation that can mean opposite things.

If we can carefully clarify our use of the word "transform", talking about matrices isn't hard.

I feel like we should definitely try and keep matrix around for developer convenience.


</div>

Expand All @@ -681,7 +681,7 @@ Spaces {#spaces}
XRSpace {#xrspace-interface}
------------------

An {{XRSpace}} describes an entity that is tracked by the [=/XR device=]'s tracking systems. {{XRSpace}}s MAY NOT have a fixed spatial relationship to one another or to any given {{XRReferenceSpace}}. The transform between two {{XRSpace}} can be evaluated by calling an {{XRFrame}}'s {{XRFrame/getPose()}} method. The interface is intentionally opaque.
An {{XRSpace}} describes an entity that is tracked by the [=/XR device=]'s tracking systems. It conceptually corresponds to a specific origin and orientation, but does not directly expose any numerical coordinate values. {{XRSpace}}s MAY NOT have a fixed spatial relationship to one another or to any given {{XRReferenceSpace}}. The current transform between two {{XRSpace}}s can be evaluated by calling an {{XRFrame}}'s {{XRFrame/getPose()}} method. The interface is intentionally opaque.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"current" is a hard concept to pin down in WebXR. This whole sentence could probably stand to be rephrased to clarify that. Something along the lines of:

"The transform between two {{XRSpace}}s at the time represented by an {{XRFrame}} can be evaluated by calling the frame's {{XRFrame/getPose()}} method."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 30ca9d2 . I've also rephrased the previous sentence to make it explicit that the spatial relationship can change even if there's no real-world movement. New text:

"The spacial relationships between {{XRSpace}}s or {{XRReferenceSpace}} change over time in response to movement of tracked entities, but can also change without real-world movement as a result of ongoing tracking system recalibration. The transform between two {{XRSpace}}s at the time represented by an {{XRFrame}} can be evaluated by calling the frame's's {{XRFrame/getPose()}} method."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason for rephrasing that sentence was that I wanted to avoid the MAY NOT construction - that's not one of the rfc 2119 key terms, and it could be misinterpreted as disallowing fixed spacial relationships as opposed to warning not to assume them.


<pre class="idl">
[SecureContext, Exposed=Window] interface XRSpace : EventTarget {
Expand Down Expand Up @@ -746,7 +746,7 @@ Note: The {{position-disabled}} subtype is primarily intended for use with pre-r
Devices that support {{XRReferenceSpaceType/stationary}} reference spaces MUST support all {{XRStationaryReferenceSpaceSubtype}}s.

<section class="unstable">
The <dfn attribute for="XRReferenceSpace">originOffset</dfn> attribute is a {{XRRigidTransform}} that describes an additional translation and rotation to be applied to any poses queried using the {{XRReferenceSpace}}. It is initially set to an [=identity transform=]. Changes to the {{originOffset}} take effect immediately, and subsequent poses queried with the {{XRReferenceSpace}} will take into account the new transform.
The <dfn attribute for="XRReferenceSpace">originOffset</dfn> attribute is a settable {{XRRigidTransform}} that affects all poses queried using that {{XRReferenceSpace}}. It is a transform from the effective {{XRReferenceSpace}} to the underlying unmodified {{XRReferenceSpace}}, and is initially set to an [=identity transform=]. Changes to the {{originOffset}} take effect immediately, and subsequent poses queried with the {{XRReferenceSpace}} will use the effective reference space that takes into account the new transform, applying the inverse of the {{originOffset}} to the pose that would have been returned from the same reference space with an identity {{originOffset}}.

Note: Changing the {{originOffset}} between pose queries in a single [=XR animation frame=] is not advised, since it will cause inconsistencies in the tracking data and rendered output.
</section>
Expand Down Expand Up @@ -849,7 +849,7 @@ The <dfn attribute for="XRView">eye</dfn> attribute describes which eye this vie

The <dfn attribute for="XRView">projectionMatrix</dfn> attribute provides a [=matrix=] describing the projection to be used when rendering the [=view=]. It is <b>strongly recommended</b> that applications use this matrix without modification or decomposition. The {{projectionMatrix}} MAY include transformations such as shearing that prevent the projection from being accurately described by a simple frustum. Failure to use the provided projection matrices when rendering may cause the presented frame to be distorted or badly aligned, resulting in varying degrees of user discomfort.

The <dfn attribute for="XRView">transform</dfn> attribute is the {{XRRigidTransform}} of the viewpoint.
The <dfn attribute for="XRView">transform</dfn> attribute is the {{XRRigidTransform}} from viewpoint space to the reference space used for the parent {{XRViewerPose}} query.

NOTE: The {{XRView/transform}} can be used to position camera objects in many rendering libraries. If a more traditional view matrix is needed by the application one can be retrieved by calling `view.transform.inverse.matrix`.
</section>
Expand Down Expand Up @@ -920,7 +920,7 @@ To <dfn>normalize</dfn> a list of components the UA MUST perform the following s
XRRigidTransform {#xrrigidtransform-interface}
----------------

An {{XRRigidTransform}} is a transform described by a {{XRRigidTransform/position}} and {{XRRigidTransform/orientation}}. When interpreting an {{XRRigidTransform}} the {{XRRigidTransform/orientation}} is always applied prior to the {{XRRigidTransform/position}}.
An {{XRRigidTransform}} is a transform from a source {{XRSpace}} to a destination {{XRSpace}} described by a {{XRRigidTransform/position}} and {{XRRigidTransform/orientation}}. When applying an {{XRRigidTransform}} the {{XRRigidTransform/orientation}} is always applied prior to the {{XRRigidTransform/position}}. Effectively, the source space is rotated around its origin using the {{XRRigidTransform/orientation}}, and then the source space's origin is translated to the {{XRRigidTransform/position}} in destination space.

<pre class="idl">
[SecureContext, Exposed=Window,
Expand Down Expand Up @@ -948,13 +948,13 @@ The <dfn constructor for="XRRigidTransform">XRRigidTransform(|position|, |orient

</div>

The <dfn attribute for="XRRigidTransform">position</dfn> attribute is a 3-dimensional point, given in meters, describing the translation component of the transform. The {{XRRigidTransform/position}}'s {{DOMPointReadOnly/w}} attribute MUST be <code>1.0</code>.
The <dfn attribute for="XRRigidTransform">position</dfn> attribute is a 3-dimensional point, given in meters, for the source space's origin location in the destination space's coordinate system. The {{XRRigidTransform/position}}'s {{DOMPointReadOnly/w}} attribute MUST be <code>1.0</code>.

The <dfn attribute for="XRRigidTransform">orientation</dfn> attribute is a quaternion describing the rotational component of the transform. The {{XRRigidTransform/orientation}} MUST be normalized to have a length of <code>1.0</code>.

The <dfn attribute for="XRRigidTransform">matrix</dfn> attribute returns the transform described by the {{XRRigidTransform/position}} and {{XRRigidTransform/orientation}} attributes as a [=matrix=]. This attribute SHOULD be lazily evaluated.
The <dfn attribute for="XRRigidTransform">matrix</dfn> attribute returns the transform described by the {{XRRigidTransform/position}} and {{XRRigidTransform/orientation}} attributes as a [=matrix=]. This attribute SHOULD be lazily evaluated. The top left 3x3 sub-matrix is the rotation matrix corresponding to the {{XRRigidTransform/orientation}}, its column values are the source space's axis directions as unit vector coordinates in destination space. The fourth column contains the {{XRRigidTransform/position}}. Premultiplying this matrix onto a column vector of source space coordinates produces a column vector of destination space coordinates.

The <dfn attribute for="XRRigidTransform">inverse</dfn> attribute returns a {{XRRigidTransform}} which, if applied to an object that had previously been transformed by the original {{XRRigidTransform}}, would undo the transform and return the object to it's initial pose. This attribute SHOULD be lazily evaluated.
The <dfn attribute for="XRRigidTransform">inverse</dfn> attribute is an inverted {{XRRigidTransform}} that transforms from destination space to source space. If applied to an object that had previously been transformed by the original {{XRRigidTransform}}, it would undo the transform and return the object to its initial pose. This attribute SHOULD be lazily evaluated.

An {{XRRigidTransform}} with a {{XRRigidTransform/position}} of <code>{ x: 0, y: 0, z: 0 w: 1 }</code> and an {{XRRigidTransform/orientation}} of <code>{ x: 0, y: 0, z: 0, w: 1 }</code> is known as an <dfn>identity transform</dfn>.

Expand Down Expand Up @@ -1481,7 +1481,7 @@ dictionary XRReferenceSpaceEventInit : EventInit {

The <dfn attribute for="XRReferenceSpaceEvent">referenceSpace</dfn> attribute indicates the {{XRReferenceSpace}} that generated this event.

The <dfn attribute for="XRReferenceSpaceEvent">transform</dfn> attribute describes the transform the {{XRReferenceSpaceEvent/referenceSpace}} underwent during this event, if applicable.
The <dfn attribute for="XRReferenceSpaceEvent">transform</dfn> attribute describes the transform the {{XRReferenceSpaceEvent/referenceSpace}} underwent during this event, if applicable, providing the transform from the old state's space to the new state's space.

Event Types {#event-types}
-----------
Expand Down
Loading