Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Appendix A on sato examples #253

Merged
merged 7 commits into from
Oct 7, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 91 additions & 6 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ url: https://www.iso.org/standard/66067.html; spec: HEIF; type: dfn;
text: aux_type
text: AuxiliaryTypeInfoBox
text: AuxiliaryTypeProperty
text: bits_per_channel

url: https://www.iso.org/standard/68960.html; spec: ISOBMFF; type: dfn;
text: compatible_brands
Expand All @@ -134,7 +135,7 @@ url: https://www.iso.org/standard/68960.html; spec: ISOBMFF; type: dfn;

url: https://www.iso.org/standard/74417.html; spec: MIAF; type: dfn;
text: miaf
text: primary image
text: primary image item
text: MIAF image item
text: MIAF image sequence
text: MIAF auxiliary image item
Expand Down Expand Up @@ -406,7 +407,7 @@ No color space conversion, matrix coefficients, or transfer characteristics func

The output reconstructed image is made up of the output samples, whose values shall be each clamped to fit in the number of bits per sample as defined by the <code>'[=pixi=]'</code> property of the reconstructed image item. The <code>[=full_range_flag=]</code> field of the <code>'[=colr=]'</code> property of <code>[=colour_type=]</code> <code>'[=nclx=]'</code> also defines a range of values to clamp to, as defined in [[!CICP]].

<div class="example">An 8-bit primary [=MIAF image item=] can be combined with another 8-bit hidden [=MIAF image item=], both used as input image items to a [=Sample Transform Derived Image Item=], with an expression corresponding to ReconstructedSample = 256 × PrimarySample + HiddenSample. The primary [=MIAF image item=] and the [=Sample Transform Derived Image Item=] are both part of the same <code>'[=AVIF/altr=]'</code> group. This will be perceived as a backward-compatible regular 8-bit image to readers not supporting [=Sample Transform Derived Image Items=], and can be decoded as a 16-bit image otherwise, making that pattern a bit-depth extension mechanism.</div>
NOTE: [[#sato-examples]] contains examples of Sample Transform Derived Image Item usage.

<h5 id="sample-transform-syntax">Syntax</h5>

Expand Down Expand Up @@ -659,7 +660,7 @@ The result of any computation underflowing or overflowing the intermediate bit d

[=Sample Transform Derived Image Items=] use the postfix notation to evaluate the result of the whole expression for each reconstructed image item sample.

- <assert>The [=sato/tokens=] shall be evaluated in the order they are defined in the metadata (the <code>SampleTransform</code> structure) of the [=Sample Transform Derived Image Item=].</assert>
- <assert>The [=sato/tokens=] shall be evaluated in the order they are defined in the metadata (the <code><dfn export>SampleTransform</dfn></code> structure) of the [=Sample Transform Derived Image Item=].</assert>
y-guyon marked this conversation as resolved.
Show resolved Hide resolved
- <assert><code>[=sato/token=]</code> shall be at most <code>[=reference_count=]</code> when evaluating a sample [=sato/operand=] (when <math><mn>1</mn><mo>≤</mo><mi>token</mi><mo>≤</mo><mn>32</mn></math>).</assert>
- <assert>There shall be at least one <code>[=sato/token=]</code>.</assert>
- The stack is empty before evaluating the first <code>[=sato/token=]</code>.
Expand Down Expand Up @@ -697,12 +698,12 @@ The '<dfn for="AVIF">ster</dfn>' entity group as defined in [[!HEIF]] may be use
The brand to identify [=AV1 image items=] is <dfn export for="AVIF Image brand">avif</dfn>.

Files that indicate this brand in the <code>[=compatible_brands=]</code> field of the <code>[=FileTypeBox=]</code> shall comply with the following:
- <assert>The primary item shall be an [=AV1 Image Item=] or be a derived image that references directly or indirectly one or more items that all are [=AV1 Image Items=].</assert>
- <assert>The [=primary image item=] shall be an [=AV1 Image Item=] or be a derived image that references directly or indirectly one or more items that all are [=AV1 Image Items=].</assert>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "primary item" is correct. It comes from the PrimaryItemBox ('pitm'). It is used in ISOBMFF, HEIF, and MIAF.

The term "primary image item" is used in HEIF and MIAF and occurs only four and five times respectively (much less frequent than "primary item").

It seems that these standards use these two terms interchangeably.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to reference it with [= =]. Do you prefer:

  • To keep as is
  • To revert to primary item
  • To use [=primary item=] and add primary item to the ISOBMFF dfn

- [=AV1 auxiliary image items=] may be present in the file.

<assert>Files that conform with these constraints should include the brand <code>[=avif=]</code> in the <code>[=compatible_brands=]</code> field of the <code>[=FileTypeBox=]</code>.</assert>

Additionally, the brand <dfn export for="AVIF Intra-only brand">avio</dfn> is defined. If the file indicates the brand <code>[=avio=]</code> in the <code>[=compatible_brands=]</code> field of the <code>[=FileTypeBox=]</code>, then <assert>the primary item or all the items referenced by the primary item shall be [=AV1 image items=] made only of [=Intra Frames=]</assert>. Conversely, <assert>if the previous constraint applies, the brand <code>[=avio=]</code> should be used in the <code>[=compatible_brands=]</code> field of the <code>[=FileTypeBox=]</code></assert>.
Additionally, the brand <dfn export for="AVIF Intra-only brand">avio</dfn> is defined. If the file indicates the brand <code>[=avio=]</code> in the <code>[=compatible_brands=]</code> field of the <code>[=FileTypeBox=]</code>, then <assert>the [=primary image item=] or all the items referenced by the [=primary image item=] shall be [=AV1 image items=] made only of [=Intra Frames=]</assert>. Conversely, <assert>if the previous constraint applies, the brand <code>[=avio=]</code> should be used in the <code>[=compatible_brands=]</code> field of the <code>[=FileTypeBox=]</code></assert>.

<h3 id="image-sequence-brand">AVIF image sequence brands</h3>
The brand to identify AVIF image sequences is <dfn export for="AVIF Image Sequence brand">avis</dfn>.
Expand Down Expand Up @@ -732,7 +733,7 @@ NOTE: This constraint further restricts files compared to [[!MIAF]].

The profiles defined in this section are for enabling interoperability between [=AV1 Image File Format=] files and [=AV1 Image File Format=] readers/parsers. A profile imposes a set of specific restrictions and is signaled by brands defined in this specification.

<assert>The <code>[=FileTypeBox=]</code> should declare at least one profile that enables decoding of the primary image item.</assert> It is not an error for the encoder to include an auxiliary image that is not allowed by the specified profile(s).
<assert>The <code>[=FileTypeBox=]</code> should declare at least one profile that enables decoding of the [=primary image item=].</assert> It is not an error for the encoder to include an auxiliary image that is not allowed by the specified profile(s).

<assert>If <code>'[=avis=]'</code> is declared in the <code>[=FileTypeBox=]</code> and a profile is declared in the <code>[=FileTypeBox=]</code>, the profile shall also enable decoding of at least one image sequence track.</assert> <assert>The profile should allow decoding of any associated auxiliary image sequence tracks, unless it is acceptable to decode the image sequence without its auxiliary image sequence tracks.</assert>

Expand Down Expand Up @@ -1194,7 +1195,91 @@ The "Version(s)" column in the following table lists the version(s) of the boxes
- <a href="https://github.com/AOMediaCodec/av1-avif/pull/239">Add information on tmap, grpl and altr</a>
- <a href="https://github.com/AOMediaCodec/av1-avif/pull/228">Replace recommendations regarding still picture flags in image items by a note</a>
- <a href="https://github.com/AOMediaCodec/av1-avif/pull/224">Add section 4.2.2 "Sample Transform Derived Image Item"</a>
- <a href="https://github.com/AOMediaCodec/av1-avif/pull/253">Add Appendix A "Sample Transform Derived Image Item Examples"</a>
- <a href="https://github.com/AOMediaCodec/av1-avif/pull/232">Add restriction on usage of clap property</a>
- <a href="https://github.com/AOMediaCodec/av1-avif/pull/240">Adopt MIAF shared constraints</a>
- EDITORIAL: <a href="https://github.com/AOMediaCodec/av1-avif/pull/251">Clean up usage of dfn and linking</a>
- <a href="https://github.com/AOMediaCodec/av1-avif/pull/250">Clarify required versions of non-essential item properties</a>

<h2 id="sato-examples">Appendix A: Sample Transform Derived Image Item Examples</h2>

This informative appendix contains example recipes for extending base AVIF features with [=Sample Transform Derived Image Items=].

<h3 id="sato-example-bit-depth-extension">Bit depth extension</h3>

[=Sample Transform Derived Image Items=] allow for more than 12 bits per channel per sample by combining several [=AV1 image items=] in multiple ways.

<h4 id="sato-example-suffix-bit-depth-extension">Suffix bit depth extension</h4>

The following example describes how to leverage a [=Sample Transform Derived Image Item=] on top of a regular 8-bit [=MIAF image item=] to extend the decoded bit depth to 16 bits.
y-guyon marked this conversation as resolved.
Show resolved Hide resolved

Consider the following:
- A [=MIAF image item=] being a losslessly coded image item,<br>and its <code>'[=pixi=]'</code> property with <code>[=bits_per_channel=]</code>=8,
- Another [=MIAF image item=] being a lossily or losslessly coded image item with the same dimensions and number of samples as the first input image item,<br>and its <code>'[=pixi=]'</code> property with <code>[=bits_per_channel=]</code>=8,
- A [=Sample Transform Derived Image Item=] with the two items above as input in this order,<br>and its <code>'[=pixi=]'</code> property with <code>[=bits_per_channel=]</code>=16,<br>and the following <code>[=SampleTransform=]</code> fields:
- <code>[=sato/version=]</code>=0
- <code>[=sato/bit_depth=]</code>=2 (signed 32-bit <code>[=sato/constant=]</code>s, stack values and intermediate results)
- <code>[=sato/token_count=]</code>=5
- <code>[=sato/token=]</code>=0, <code>[=sato/constant=]</code>=256
- <code>[=sato/token=]</code>=1 (sample from 1<sup>st</sup> input image item)
- <code>[=sato/token=]</code>=130 (product)
- <code>[=sato/token=]</code>=2 (sample from 2<sup>nd</sup> input image item)
- <code>[=sato/token=]</code>=128 (sum)

This is equivalent to the following postfix notation (parentheses for clarity):

<math><msub><mi>sample</mi><mi>output</mi></msub><mo>=</mo><mo>(</mo><mn>256</mn><mspace width="1ch"/><msub><mi>sample</mi><mn>1</mn></msub><mo>×</mo><mo>)</mo><msub><mi>sample</mi><mn>2</mn></msub><mo>+</mo></math>

This is equivalent to the following infix notation:

<math><msub><mi>sample</mi><mi>output</mi></msub><mo>=</mo><mn>256</mn><mo>×</mo><msub><mi>sample</mi><mn>1</mn></msub><mo>+</mo><msub><mi>sample</mi><mn>2</mn></msub></math>

Each output sample is equal to the sum of a sample of the first input image item shifted to the left by 8 and of a sample of the second input image item. This can be viewed as a bit depth extension of the first input image item by the second input image item. The first input image item contains the 8 most significant bits and the second input image item contains the 8 least significant bits of the output reconstructed image item which has a bit depth of 16, something that is impossible to achieve with a single [=AV1 image item=].

NOTE: If the first input image item is the [=primary image item=] and is enclosed in an <code>'[=AVIF/altr=]'</code> group with the [=Sample Transform Derived Image Item=], the first input image item is also a backward-compatible 8-bit regular coded image item that can be used by readers that do not support [=Sample Transform Derived Image Items=] or do not need extra precision.

NOTE: The second input image item loses its meaning of least significant part if any of the most significant bits changes, so the first input image item has to be losslessly encoded. The second input image item supports reasonable loss during encoding.

NOTE: This pattern can be used for reconstructed bit depths beyond 16 by combining more than two input image items or with various input bit depth configurations and operations.

<h4 id="sato-example-residual-bit-depth-extension">Residual bit depth extension</h4>

The following example describes how to leverage a [=Sample Transform Derived Image Item=] on top of a regular 12-bit [=MIAF image item=] to extend the decoded bit depth to 16 bits.<br>
It differs from the [[#sato-example-suffix-bit-depth-extension]] by its slightly longer series of operations allowing its first input image item to be lossily encoded.

Consider the following:
- A [=MIAF image item=] being a lossily coded image item,<br>and its <code>'[=pixi=]'</code> property with <code>[=bits_per_channel=]</code>=12,
- Another [=MIAF image item=] being a lossily or losslessly coded image item with the same dimensions and number of samples as the first input image item,<br>and its <code>'[=pixi=]'</code> property with <code>[=bits_per_channel=]</code>=8,<br>with the following contraints:
y-guyon marked this conversation as resolved.
Show resolved Hide resolved
<li style="list-style: none"><ul><li style="list-style: none">For each sample position in each plane,<br><math><msub><mi>sample</mi><mi>original</mi></msub></math> being the value of the 16-bit original sample at that position in that plane,<br><math><msub><mi>sample</mi><mi>1</mi></msub></math> being the value of the 12-bit sample of the first input image at that position in that plane,<br><math><msub><mi>sample</mi><mi>2</mi></msub></math> being the value of the sample of the second input image at that position in that plane,<br><math><mo>≈</mo></math> representing similarity within compression loss range,</li></ul></li>
- <math><msub><mi>sample</mi><mi>1</mi></msub><mo>≈</mo><mfrac><msub><mi>sample</mi><mi>original</mi></msub><msup><mn>2</mn><mn>4</mn></msup></mfrac></math>
- <math><msub><mi>sample</mi><mi>2</mi></msub><mo>≈</mo><msub><mi>sample</mi><mi>original</mi></msub><mo>-</mo><msup><mn>2</mn><mn>4</mn></msup><mo>×</mo><msub><mi>sample</mi><mi>1</mi></msub><mo>+</mo><msup><mn>2</mn><mn>7</mn></msup></math>
- <math><mn>0</mn><mo>≤</mo><msub><mi>sample</mi><mi>1</mi></msub><mo>&lt;</mo><msup><mn>2</mn><mn>12</mn></msup></math>
- <math><mn>0</mn><mo>≤</mo><msub><mi>sample</mi><mi>2</mi></msub><mo>&lt;</mo><msup><mn>2</mn><mn>8</mn></msup></math>
- <math><mn>0</mn><mo>≤</mo><msup><mn>2</mn><mn>4</mn></msup><mo>×</mo><msub><mi>sample</mi><mi>1</mi></msub><mo>+</mo><msub><mi>sample</mi><mi>2</mi></msub><mo>-</mo><msup><mn>2</mn><mn>7</mn></msup><mo>&lt;</mo><msup><mn>2</mn><mn>16</mn></msup></math><br><p class="note" role="note"><span class="marker">NOTE:</span> Files that do not respect this constraint will still decode successfully because Clause [[#sample-transform-definition]] mandates the resulting values to be each clamped to fit in the number of bits per sample as defined by the <code>'[=pixi=]'</code> property of the reconstructed image item.</p>
- A [=Sample Transform Derived Image Item=] with the two items above as input in this order,<br>and its <code>'[=pixi=]'</code> property with <code>[=bits_per_channel=]</code>=16,<br>and the following <code>[=SampleTransform=]</code> fields:
- <code>[=sato/version=]</code>=0
- <code>[=sato/bit_depth=]</code>=2 (signed 32-bit <code>[=sato/constant=]</code>s, stack values and intermediate results)
- <code>[=sato/token_count=]</code>=7
- <code>[=sato/token=]</code>=0, <code>[=sato/constant=]</code>=16
- <code>[=sato/token=]</code>=1 (sample from 1<sup>st</sup> input image item)
- <code>[=sato/token=]</code>=130 (product)
- <code>[=sato/token=]</code>=2 (sample from 2<sup>nd</sup> input image item)
- <code>[=sato/token=]</code>=128 (sum)
- <code>[=sato/token=]</code>=0, <code>[=sato/constant=]</code>=-128
- <code>[=sato/token=]</code>=128 (sum)
y-guyon marked this conversation as resolved.
Show resolved Hide resolved

This is equivalent to the following postfix notation (parentheses for clarity):

<math><msub><mi>sample</mi><mi>output</mi></msub><mo>=</mo><mo>(</mo><mo>(</mo><mn>16</mn><mspace width="1ch"/><msub><mi>sample</mi><mn>1</mn></msub><mo>×</mo><mo>)</mo><mspace width="1ch"/><msub><mi>sample</mi><mn>2</mn></msub><mo>+</mo><mo>)</mo><mn>128</mn><mo>-</mo></math>

This is equivalent to the following infix notation:

<math><msub><mi>sample</mi><mi>output</mi></msub><mo>=</mo><mn>16</mn><mo>×</mo><msub><mi>sample</mi><mn>1</mn></msub><mo>+</mo><msub><mi>sample</mi><mn>2</mn></msub><mo>-</mo><mn>128</mn></math>

Each output sample is equal to the sum of a sample of the first input image item shifted to the left by 4 and of a sample of the second input image item offset by -128. This can be viewed as a bit depth extension of the first input image item by the second input image item which contains the residuals to correct the precision loss of the first input image item.

NOTE: If the first input image item is the [=primary image item=] and is enclosed in an <code>'[=AVIF/altr=]'</code> group with the derived image item, the first input image item is also a backward-compatible 12-bit regular coded image item that can be used by decoding contexts that do not support [=Sample Transform Derived Image Items=] or do not need extra precision.

NOTE: The first input image item supports reasonable loss during encoding because the second input image item "overlaps" by 4 bits to correct the loss. The second input image item supports reasonable loss during encoding.

NOTE: This pattern can be used for reconstructed bit depths beyond 16 by combining more than two input image items or with various input bit depth configurations and operations.
y-guyon marked this conversation as resolved.
Show resolved Hide resolved