Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the TrackEntry\Language mandatory #451

Merged
merged 1 commit into from
Mar 23, 2021
Merged

Conversation

robUx4
Copy link
Contributor

@robUx4 robUx4 commented Feb 7, 2021

Following the bug/inconsistency found in EBML [1] it might be better to make
the track language mandatory. It will give better past/present compatibility.

In light of the next Track Selection explanation it's also vital to know the
language when you have more than one audio/subtitle track.

This may be a problem for video tracks that were not mandatory and technically
didn't have a defined language. The flag does make sense though, even for video.
For example some text on the screen may appear differently between languages.

This change will turn every track of every file that doesn't have this value set
assume it's in English. That's probably what any system dealing with track selection
is already doing anyway given it was the official default value.

[1] ietf-wg-cellar/ebml-specification#394

Following the bug/inconsistency found in EBML [1] it might be better to make
the track language mandatory. It will give better past/present compatibility.

In light of the next Track Selection explanation it's also vital to know the
language when you have more than one audio/subtitle track.

This may be a problem for video tracks that were not mandatory and technically
didn't have a defined language. The flag does make sense though, even for video.
For example some text on the screen may appear differently between languages.

This change will turn every track of every file that doesn't have this value set
assume it's in English. That's probably what any system dealing with track selection
is already doing anyway given it was the official default value.

[1] ietf-wg-cellar/ebml-specification#394
@robUx4 robUx4 added bug enhancement spec_main Main Matroska spec document target labels Feb 7, 2021
@hubblec4
Copy link
Contributor

hubblec4 commented Feb 7, 2021

I think this a simple solution, but not the best.
We have more track types (buttons, control and so on.) which are then all automatically have a language set to English.

I have read all the other issues and PRs to this topic and yes there are many tools/apps out which handle mandatory and default value not correct.

Like MKVToolNix which don't write the track-language element if the value is eng, but this is wrong. The header of the track-language element must be written, but it's possible to omit the value.

And when reading the track section and there is no language element, then the language is undefined and not English(eng).

@mbunkus
Copy link
Contributor

mbunkus commented Feb 7, 2021

yes there are many tools/apps out which handle mandatory and default value not correct.

I know you didn't mean it like that, but I find this somewhat insulting. All those programs worked spec-compliant for close to 18 years. Their authors followed what we told them how to handle mandatory elements that aren't present. We just changed the specs under them for no good reason and suddenly they all have to do more work and deal with bug reports and ambiguity.

From my point of view it's the other way around: the EBML RFC is buggy and doesn't follow the what's been established practice for 18 years.

@hubblec4
Copy link
Contributor

hubblec4 commented Mar 4, 2021

ready to merge?
or are there any objections?

@mbunkus
Copy link
Contributor

mbunkus commented Mar 4, 2021

Dang, I thought I had approved this one already. Ready from my point of view.

@hubblec4
Copy link
Contributor

hubblec4 commented Mar 4, 2021

I thought I had approved this one already.

Yes we talked about it at the Matroska meeting and we agreed, but in this PR are 3(now 2) approval was left.

@JeromeMartinez
Copy link
Contributor

This may be a problem for video tracks that were not mandatory and technically didn't have a defined language

It sometimes has a language, you can have the star wars intro in several languages or japanime with signs in japanese or translated.

As I understand:

  • if no element, it is same as if element is present with a value "eng"
  • if someone wants to say that there is really no language associated (nothing written anywhere on a a video), an 0-sized element is put in the file.
    Correct?

@mkver
Copy link
Contributor

mkver commented Mar 4, 2021

This may be a problem for video tracks that were not mandatory and technically didn't have a defined language

It sometimes has a language, you can have the star wars intro in several languages or japanime with signs in japanese or translated.

As I understand:

* if no element, it is same as if element is present with a value "eng"

* if someone wants to say that there is really no language associated (nothing written anywhere on a a video), an 0-sized element is put in the file.
  Correct?

No. According to the new rules, such an element (called Empty Element in EBML parlance) will be read as the element's default value (i.e. "eng"). If you want to say that there is no language associated, you use either "und" (undetermined) or "zxx" (no linguistic content, no applicable). The latter is more correct (the former actually says that it has a language, but no one bothered to look at it in order to find out what language it is), the latter far more widespread.

There are btw three more elements that need to be made (partially) mandatory in order not to break old files: DisplayUnit and DisplayWidth and DisplayHeight (the latter two only in case DisplayUnit is zero).

@mbunkus
Copy link
Contributor

mbunkus commented Mar 4, 2021

For parsers that act before-spec-change-like:

  • Element is present and set to something = that something is used
  • Element is NOT present = eng is used
  • Element is present with zero size = implementation-defined as this hadn't been specced before. I couldn't even tell you what mkvmerge does with that at the moment.

Signalling "there is no linguistic content" or "the language is unknown" or "there are multiple languages" can all be done with special ISO 639-2 codes:

  • zxx = no linguistic content; not applicable
  • und = undetermined
  • mul = multiple languages

The "I don't know!" situation would call for using und.

@JeromeMartinez
Copy link
Contributor

The "I don't know!" situation would call for using und.

We are lucky here that there is a value for "I don't know!" but it may not be the case for other strings, so IIUC we'll have to be careful when setting a non 0-sized default value for string, in the case of that a 0-sized string makes sense for an element.

@hubblec4
Copy link
Contributor

hubblec4 commented Mar 4, 2021

Element is present with zero size = implementation-defined as this hadn't been specced before. I couldn't even tell you what mkvmerge does with that at the moment.

MTX reads all fine, in the merge tab the value "und" is used, same in the header editor.

TrackLanguage_Element_ZeroSize.zip

@JeromeMartinez
Copy link
Contributor

MTX reads all fine, in the merge tab the value "und" is used, same in the header editor.

So it is now the wrong behavior, do I understand correctly now?

@hubblec4
Copy link
Contributor

hubblec4 commented Mar 4, 2021

Mmmh... yes, "eng" should be used -> new RFC specs
But this breaks older mkv's and softwares, for that we should add the mandatory attribute.

@mkver
Copy link
Contributor

mkver commented Mar 4, 2021

MTX reads all fine, in the merge tab the value "und" is used, same in the header editor.

So it is now the wrong behavior, do I understand correctly now?

Indeed.

Mmmh... yes, "eng" should be used -> new RFC specs
But this breaks older mkv's and softwares, for that we should add the mandatory attribute.

It can't break compliant older files: All strings with a default value must be in ISO-639-2 form; an empty string is not.

@hubblec4
Copy link
Contributor

hubblec4 commented Mar 4, 2021

But this breaks older mkv's and softwares, for that we should add the mandatory attribute.

But also with a mandatory attribute a Matroska writer can write only the element header with a zero-size and a parser have to use "eng".

@mkver
Copy link
Contributor

mkver commented Mar 4, 2021

But this breaks older mkv's and softwares, for that we should add the mandatory attribute.

But also with a mandatory attribute a Matroska writer can write only the element header with a zero-size and a parser have to use "eng".

Yes, if this feature is used (which makes no sense for a mandatory element), then old parsers are broken.

(I presume your post was supposed to be a reply to my comment on your post that you cited?)

@hubblec4
Copy link
Contributor

hubblec4 commented Mar 4, 2021

(I presume your post was supposed to be a reply to my comment on your post that you cited?)

No. I have comment my own words, because I recognized that adding a mandatory attribute not resolves the issue with zero-size elements for some softwares.

@robUx4
Copy link
Contributor Author

robUx4 commented Mar 6, 2021

There are btw three more elements that need to be made (partially) mandatory in order not to break old files: DisplayUnit and DisplayWidth and DisplayHeight (the latter two only in case DisplayUnit is zero).

Yes. We're starting with the track language as it's the most common and problematic one. It's also a test to check what is the impact of such a change before we generalize it.

@robUx4
Copy link
Contributor Author

robUx4 commented Mar 6, 2021

But this breaks older mkv's and softwares, for that we should add the mandatory attribute.

But also with a mandatory attribute a Matroska writer can write only the element header with a zero-size and a parser have to use "eng".

We already established that the "zero-size" feature is probably not used in the wild, so IMO that's not a big deal if we break it. Current implementation use an empty string in this case which, as @mkver mentioned, is not a valid language string. So in any case such a software doesn't read a proper value anyway (or if it's half smart it detects the empty string and use the default value for that element in the end).

As I said, in VLC this is already the case where the language is set to "eng" by default, even before the Track is read. This is the case in libavformat as well. That covers most desktop readers.

There are also web browsers (WebM) that can parse this element as well. But I'm not aware they can select a track based on its language so even if they read the element (wrong) it's not used. It's mostly informational in the file.

I'm not sure how this is handled in the default Android media player, is it possible to select the track based on the language ?

Then there are the other appliances (TVs, Blu-ray players, media streamers) that can read .MKV files. The good ones can show the language of each track so the user can select it. I have no idea if they are using their own parser, a parser they buy with the chip (along with the codec handling code) or use libavformat. We can eliminate the case of 'zero-size' because those files don't exist anyway. This is likely untested and/or unhandled.

The issue is when the element is not present. Did they assume it was "eng" or "und" ? We can't test everything and there's no one to ask. So we can just guess. I think by default they assume it's "eng" for the same reason libavformat and VLC initialize their variable to "eng". This was always the default value. And it's easier to handle than having a flag next to this variable telling if it was actually set in the file and we kept the default. And then base the track selection and display based on this. I think it's unlikely anyone went through this level of refinement (until now).

And a video track with a set language is also very unlikely to be any use to them either. It's really only used for audio and subtitles.

So given all that, I think making the TrackEntry\Language mandatory will have no impact in the real world.

@mbunkus
Copy link
Contributor

mbunkus commented Mar 6, 2021

So given all that, I think making the TrackEntry\Language mandatory will have no impact in the real world.

Totally agree.

@hubblec4
Copy link
Contributor

hubblec4 commented Mar 6, 2021

2 missing approvals
and then we can merge this PR?

@JeromeMartinez JeromeMartinez merged commit f555c7b into master Mar 23, 2021
@robUx4 robUx4 deleted the mandatory-lang branch March 23, 2021 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug enhancement spec_main Main Matroska spec document target
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants