Interaction between "represents" and "script event type" #227

cconcolato · 2024-04-11T15:33:18Z

As discussed in the TTWG call on April 11, 2024, during the review of #217, we wonder what the interactions between the new daptm:represents and daptm:eventType. Both attributes are registry-based.

The text was updated successfully, but these errors were encountered:

nigelmegitt · 2024-04-12T08:31:22Z

Reproducing and rephrasing the options we discussed here for ease of access:

Allow the event type values to be coalesced into represents at the document level, i.e. have a single registry table for both the values of <content-descriptor> and daptm:eventType.
Add a mapping from the values allowed in event type into a simpler smaller set of represents values, e.g. title and OnScreenText in daptm:eventType both map to visualText in daptm:represents.
Replace daptm:eventType with daptm:represents and use the same registry table for both.
The nuance is that represents allows a list, whereas eventType maybe should be a single value.
not have the document level summary at all but inspect the document contents
to see what it contains, i.e. remove daptm:represents.

nigelmegitt · 2024-09-10T16:27:26Z

I've looked at this and my conclusions are:

We can use represents for both document level and script event level descriptive data, with the same registry.
There may be value in being more specific at script event level than at document level, so we could add values into the registry that aren't expected to be used at document level. For example, "visualText" might also be "credit" or "location", but it might be reasonable to put only "visualText" at document level.

I wonder if it's worth putting an extra field into the registry to say "is a sub-type of", which would point to another row in the registry. For example:

Value	Is sub-type of
visualText	-
credit	visualText
location	visualText

nigelmegitt · 2024-09-12T13:55:02Z

#241 opened to resolve this issue - as it stands in that pull request there's no formal linkage between values and sub-types as proposed in #227 (comment) - I just put the linkage informally into the description. It would be an easy incremental fix to change that.

nigelmegitt · 2024-09-13T11:54:31Z

See #241 (comment) for the IRC log of what I scribed during a discussion of the related pull request during yesterday's TTWG call. I'm not sure I did all that well at scribing, so while it's relatively fresh, here are some notes and additional thoughts:

Initial proposal in #241 at `3681360`

The initial proposal is a single daptm:represents attribute (a "shared property" called Represents) that must be applied at the document level, on the <tt> element, and may be applied at the Script Event level, on the <div> element.

In each case, the attribute describes what the object represents, and is a whitespace separated list of terms in the registry defined for <content-descriptor>.

At the document level, it says "here are the parts of the related media that the contents of this document represent" - I'd expect a short list of high level terms, probably not down to the level of "includes the titles, the credits and locations", more likely "visualText", but that's for implementers and users to decide.
At the script event level, it says "here are the parts of the related media that the Text(s) of this script event represents, during the time interval that this script event is active" - I'd expect a more granular set of terms, probably not as generic as "visualText", more likely something specific like "dialogue" or "location", but again, for implementers and users to decide.

This is very open, and there is no implication of inheritance or any enforcement of constraints about what is on the document level vs the script event level. It is the simplest design I could think of, as a first step.

Summary of comments and discussion

Agreement with the approach that we should have one value set (registry table) for represents and not have a separate one for Script Event Type;
Expectation that there should be inheritance, to reduce verbosity;
Expectation that the inheritance model should be like xml:lang or daptm:langSrc;
Expectation that there should be constraints preventing clashes between the document level signalling and the Script Event level signalling;
Query about whether represents should be mandatory on Script Events;
Query about whether a Shared Property is the right approach, or if it should be like Text Language Source, with an inheritance model, an applies to and a set of elements on which it may be specified;
Query about whether there should be two different attributes that both make use of the <content-descriptor> component, but on the document level a list of multiple values would be permitted, and at the script event level a single value would be permitted;
Bug report (I noticed) that the syntax for daptm:represents has optional (0..many) white space between list terms, that should be mandatory (1..many).

Inheritance model (my thoughts)

At the document level the content descriptors can be mutually exclusive ("this document represents both dialogue and on screen text") but at the script event level that makes no sense.

For that reason, a blanket top-down inheritance model would not work in a large number of cases.

Some attributes, like ttm:role have an inheritance model that is additive, where an element inherits the role set from its parent, and can additionally specify more roles. For others, like xml:lang it's a replacement, where the inherited value is entirely overwritten, but applies to the descendants that do not specify a different value.

There's no subtractive inheritance model that I'm aware of, though a syntax and semantic could be created, e.g. +credit could be "add credit to the list" and -location could be "remove location from the list".

It's not common for there to be an inheritance blocking point in the tree, in XML, but this is another potential approach. For example, setting daptm:represents on the <tt> element could be defined as non-inheriting down to the <body> but setting it on the <body> element could be defined as inheriting down the tree from there.

Another way to permit inheritance but resolve this document-vs-event issue is to have the document level metadata apply to the <head> element instead of the <tt> element, since <body> is never a descendant of <head>.

Overall, I'm not against inheritance here, but I don't think it's very useful, and it could increase implementation complexity. It seems simpler to specify that it is not inherited, and be done. That would be a small specification change.

If we make Represents a mandatory property of a Script Event, then inheritance ceases to be useful anyway. This approach could be used to distinguish Script Event <div> elements from other <div> elements that are only used for grouping.

Constraints

The point was made that in a document that claims that it represents only visual content, say, it would be some kind of error if a script event in that document claims that it represents audio content. The idea is that a validator would be able to check that the document level summary does not conflict with the more granular contents.

I also noted that it may be reasonable to claim that the document represents both non-dialogue sounds and dialogue, but for there to be no actual script events that represent non-dialogue sounds. My example was a transcript of a video of a single speaker talking in a quiet room. All the non-dialogue sounds are represented, but it's truly an empty set.

So the constraint seems to be that no Script Event should represent something that isn't in the Document-level represents list.

However, this returns me to the question about sub-typing in the <content-descriptor> registry list. Could it be reasonable to say at the document level daptm:represents="visualText" and at a Script Event level have daptm:represents="location", if location is defined as being a sub-type of visualText? We ran out of time to discuss this point.

Next steps

Fix the syntax bug
Circle back to the requirements in this issue and agree if the current loose approach is good enough, or if we need to add inheritance and/or constraints, and if so, if they can be in separate pull requests or need to be in Remove Script Event Type, use Represents instead #241 before it can be merged.

nigelmegitt · 2024-09-16T15:29:23Z

Just had an idea that will help with the sub-typing complexity issue that I raised earlier, and validity checking that Script Events are okay within the Document, in terms of what they represent.

The idea is: if we make <content-descriptor>'s values a non-whitespace delimiter separated set of strings, then the constraint is that there must be a value in the Document level Represents list that is a substring at the beginning of the Script Event Represents value, where the strings are separated at the delimiter. For example, using a . delimiter, because it seems familiar:

Document Represents	Script Event Represents	Valid?
visual	visual	Yes
audio	visual	No
audio	audio.dialog	Yes
audio.dialog	audio	No
visual.text visual.nonText	visual.text.location	Yes
visual.nonT	visual.nonText	No (need complete term values between the delimiters or at the end)

This allows us to maintain a flat registry structure, without implementations having to worry about what the registry values actually mean.

It also helps for conversion from legacy formats or workflows that don't capture this data in a granular way, in that really generic top level terms can be used everywhere.

cconcolato · 2024-09-24T15:48:35Z

A note on the inheritance part. You discuss the additive model (e.g. ttm:role, which is very awkward to me ...) and a hypothetical subtractive model. I don't think we should consider the subtractive model. You did not mention the replacement model (e.g. xml:lang) My suggestion would be to use that model. If the attribute is present at a Script Event level (no matter how many values it has), it replaces the value(s) inherited from the document level.

cconcolato · 2024-09-24T15:50:31Z

However, this returns me to the #227 (comment) about sub-typing in the registry list. Could it be reasonable to say at the document level daptm:represents="visualText" and at a Script Event level have daptm:represents="location", if location is defined as being a sub-type of visualText? We ran out of time to discuss this point.

Yes, that seems reasonable to me.

cconcolato · 2024-09-24T15:54:05Z

Some more thoughts on the registry.

The idea of having hierarchical values with the . prefix seems reasonable to me.

I also wondered how would we allow proprietary values. Would we allow values like x-MyValue? In that case, we need to say so. Would we allow urns, like "urn:vendorX:represents:VendorXSpecialValue"?

I also think we need better guidance around use of descType vs eventType/represents. They are both registry based. When you have a value to register (e.g. some TTAL values), should you use the former or the latter? For example, if one uses a desc element without text content, just a descType attribute, the usage seems similar to adding that value in the eventType . Should we mandate that when a desc element is used with a descType attribute, text content must be present and non-empty, otherwise the descType value should be registered with the eventType registry?

nigelmegitt · 2024-09-24T15:57:10Z

You did not mention the replacement model (e.g. xml:lang)

Just for the record, I did, and used xml:lang as an example, in the same paragraph as I mentioned ttm:role.

nigelmegitt · 2024-09-24T16:01:26Z

I'm happy to introduce a replacement-style inheritance model, and not make presence of daptm:represents on a <div> element mandatory for Script Events.

This would also mean we have another decision to make. Do we either:

use a different attribute name for the document-level represents attribute on the <tt> element OR
locate the document level represents attribute on or under the <head> element so its value cannot be inherited into <body>?

I think I'm leaning towards the first of these, because that also allows us more easily to specify different syntax requirements, i.e. a list of <content-descriptor>s at the document level, and a single one at the Script Event level.

cconcolato · 2024-09-24T16:12:52Z

You did not mention the replacement model (e.g. xml:lang)

Just for the record, I did, and used xml:lang as an example, in the same paragraph as I mentioned ttm:role.

Sorry. Missed that.

nigelmegitt · 2024-09-24T17:00:36Z

I also wondered how would we allow proprietary values. Would we allow values like x-MyValue? In that case, we need to say so. Would we allow urns, like "urn:vendorX:represents:VendorXSpecialValue"?

Good point, I think we could do that, yes. Are you suggesting that we make urn one of the Registry table values?

The URN question makes me wonder what scenarios we have to consider when choosing a token delimiter. Do we need to allow multiple alternative delimiters? Are there any that would cause problems?

Some possibilities (there are more!):

<lwsp> - I want to reserve this for separating different <content-descriptor>s in a list, so it's not suitable.
. - this was my first thought, but would it cause any difficulties?
: - this would make URNs get split up into tokens, which might be fine
;
/
+

cconcolato · 2024-09-24T17:45:57Z

Are you suggesting that we make urn one of the Registry table values?

Maybe. I'm not sure. It could get very verbose if you have to set that URN on every event...

cconcolato · 2024-09-24T17:47:23Z

The URN question makes me wonder what scenarios we have to consider when choosing a token delimiter. Do we need to allow multiple alternative delimiters? Are there any that would cause problems?

I think . is fine. Using : would indeed split a URN and I don't think we want that.

nigelmegitt · 2024-09-24T18:17:06Z

Further to the above, in XML, NMToken is a useful reference point for the tokens, and it permits . and : but not ; (0x3B) or / (0x2F) so I'm tempted to use Name as the token definition and ; as the delimiter.

nigelmegitt · 2024-09-24T19:04:45Z

I also think we need better guidance around use of descType vs eventType/represents. They are both registry based. When you have a value to register (e.g. some TTAL values), should you use the former or the latter? For example, if one uses a desc element without text content, just a descType attribute, the usage seems similar to adding that value in the eventType . Should we mandate that when a desc element is used with a descType attribute, text content must be present and non-empty, otherwise the descType value should be registered with the eventType registry?

I think ttm:desc is there to add a description of what the element content is, whereas represents is describing what the content represents within the related media. It's a subtle distinction, but we manage the value sets via the Registries, so we should be able to keep it clean.

If it would help we can add that ttm:desc SHOULD NOT be empty?

nigelmegitt · 2024-09-24T23:43:07Z

I've updated #241.

nigelmegitt added the CR must-have Must be resolved before going to CR label Jun 6, 2024

nigelmegitt self-assigned this Sep 12, 2024

nigelmegitt mentioned this issue Sep 12, 2024

Remove Script Event Type, use Represents instead #241

Merged

nigelmegitt mentioned this issue Sep 13, 2024

Consider improving identification of divs corresponding to script events #233

Closed

nigelmegitt closed this as completed in #241 Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction between "represents" and "script event type" #227

Interaction between "represents" and "script event type" #227

cconcolato commented Apr 11, 2024

nigelmegitt commented Apr 12, 2024

nigelmegitt commented Sep 10, 2024

nigelmegitt commented Sep 12, 2024

nigelmegitt commented Sep 13, 2024

nigelmegitt commented Sep 16, 2024 •

edited

Loading

cconcolato commented Sep 24, 2024

cconcolato commented Sep 24, 2024

cconcolato commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024 •

edited

Loading

cconcolato commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024 •

edited

Loading

cconcolato commented Sep 24, 2024

cconcolato commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024

Interaction between "represents" and "script event type" #227

Interaction between "represents" and "script event type" #227

Comments

cconcolato commented Apr 11, 2024

nigelmegitt commented Apr 12, 2024

nigelmegitt commented Sep 10, 2024

nigelmegitt commented Sep 12, 2024

nigelmegitt commented Sep 13, 2024

Initial proposal in #241 at 3681360

Summary of comments and discussion

Inheritance model (my thoughts)

Constraints

Next steps

nigelmegitt commented Sep 16, 2024 • edited Loading

cconcolato commented Sep 24, 2024

cconcolato commented Sep 24, 2024

cconcolato commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024 • edited Loading

cconcolato commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024 • edited Loading

cconcolato commented Sep 24, 2024

cconcolato commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024

nigelmegitt commented Sep 24, 2024

Initial proposal in #241 at `3681360`

nigelmegitt commented Sep 16, 2024 •

edited

Loading

nigelmegitt commented Sep 24, 2024 •

edited

Loading

nigelmegitt commented Sep 24, 2024 •

edited

Loading