GTFS-Translations #180

LeoFrachet · 2019-08-08T01:55:40Z

As explained in the issue #138 (Jan 29th 2019) then in the issue #175 (Jul 28th 2019), we drafted a GTFS-Translations proposal (bit.ly/gtfs-translations), which is based on Google's old private GTFS translation extension.

Since then, and after a few modification of the proposal (see the Google doc), Google has shifted to use it internally, deprecating their old private GTFS translation extension, as described in their documentation (here).

I'm opening a pull request with the current (2019-08-07T22:00:00-04:00) state of the Google Doc.

Google is already consuming since quite a while. What's currently missing to open the vote is a producer.

aababilov · 2019-08-08T02:05:51Z

+1 from Google. We have a big provider that gives more than 100 feeds in several countries and uses GTFS-Translations spec.

aababilov · 2019-08-08T02:13:44Z

Here is a feed that Google gets from our producer for the city of Lviv in Ukraine:
https://drive.google.com/open?id=1qGuy5Y-jJvGy2fHU6h4Jv_zdGWNbTnEH

flocsy · 2019-08-08T04:38:37Z

The default language - per dataset - is not clear to me. Shouldn't we allow different default language per record?. If for example there's a dataset that contains Switzerland (the whole of it) then what would be the default language? To me it sounds like probably Zürich (de) should be the default for Zurich, however Genève (fr) the default for Geneva.

LeoFrachet · 2019-08-08T15:39:54Z

@flocsy: Yes, this is the example cited in the definition of feed_lang. When the default language must vary from places, you can defined it as mul and provide the local version in every place:

If the dataset contains values in multiple languages (e.g. in multilingual countries like Switzerland, Belgium or Canada), the norm ISO 639-2 contains the language code “mul” to describe such reality. In such case, the best practice is to provide a translation for each of the languages used in the dataset.

For example, a dataset in Switzerland will have feed_lang=mul and will contain by default stop names “Genève” for Geneva, “Zürich” for Zurich and “Biel/Bienne” for the bilingual city of Biel/Bienne. But translations will be provided, in German: “Genf”, “Zürich” and “Biel”; in French: “Genève”, “Zurich” and “Bienne”; in Italian: “Ginevra”, “Zurigo” and “Bienna”; and in English: “Geneva”, “Zurich” and “Biel/Bienne”.

If what you're suggesting is that we attached default_lang information to sub section of the feed, like agency or stop, that could be doable but I don't see the added value to do it.

flocsy · 2019-08-08T20:53:44Z

Ah, ok I wasn't aware of the "mul" standard. I thought it means something else. Maybe you should emphasize it so everyone understand. Just to help you where to rephrase, this is what I thought it means: I thought that usually feeds have 1 language only, so they would have feed_lang=hu. (this would also mean there's no translation.txt) However if there is translation.txt, then feed_lang must be set to "mul". Now I understand that the two (existance of translations.txt and feed_lang=mul) are not necessarily connected. Regarding the added value of default_lang per sub section: no, per section it wouldn't give any useful info, I agree. But I would maybe like to see an optional field: "lang" in all the tables that can be translated, and it would only be used/useful if the default_lang="mul". Well it really depends on the consumer apps... but I can say that in a place where they have latin letters I might prefer to see the local names (Geneva, Zürich), 'cause that's the way I most probably will see/hear it, so why displaying it in English, just because my phone's language is set to English, Hungarian, Hebrew. On the other hand, if we're talking about a place where they have non-latin letters, then I am not able to read them, so I'd prefer English. I agree that for the 2 examples I gave it's not necessary to know the default language of each record, but it might be useful.

…

On Thu, Aug 8, 2019 at 6:40 PM Leo Frachet ***@***.***> wrote: @flocsy <https://github.com/flocsy>: Yes, this is the example cited in the definition of feed_lang. When the default language must vary from places, you can defined it as mul and provide the local version in every place: If the dataset contains values in multiple languages (e.g. in multilingual countries like Switzerland, Belgium or Canada), the norm ISO 639-2 contains the language code “mul” to describe such reality. In such case, the best practice is to provide a translation for each of the languages used in the dataset. For example, a dataset in Switzerland will have feed_lang=mul and will contain by default stop names “Genève” for Geneva, “Zürich” for Zurich and “Biel/Bienne” for the bilingual city of Biel/Bienne. But translations will be provided, in German: “Genf”, “Zürich” and “Biel”; in French: “Genève”, “Zurich” and “Bienne”; in Italian: “Ginevra”, “Zurigo” and “Bienna”; and in English: “Geneva”, “Zurich” and “Biel/Bienne”. If what you're suggesting is that we attached default_lang information to sub section of the feed, like agency or stop, that could be doable but I don't see the added value to do it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#180?email_source=notifications&email_token=AAHI4RCL5GDOU5CP2GXT4ODQDQ45DA5CNFSM4IKFTV22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34AYIY#issuecomment-519572515>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHI4REJTAYQOY4CGIQ4TITQDQ45DANCNFSM4IKFTV2Q> .

-- Gavriel Fleischer

LeoFrachet · 2019-08-12T15:33:53Z

But I would maybe like to see an optional field: "lang" in all the tables that can be translated, and it would only be used/useful if the default_lang="mul".

It would be useful for some, but sometime just one stop name would already be mul (Biel/Bienne). If there is a producer & a consumer interested by such feature, let us know. But I would rather keep that for another proposal later on, since it works as an extension of the current proposal.

Well it really depends on the consumer apps... but I can say that in a place where they have latin letters I might prefer to see the local names (Geneva, Zürich), 'cause that's the way I most probably will see/hear it, so why displaying it in English, just because my phone's language is set to
English, Hungarian, Hebrew.

Indeed, it depends on the consumer apps. There would be a lot to say of how should those translated fields be filled (e.g. should "Köln" be translated in English as "Cologne"? "Köln (Cologne)"?), but this is IMHO on the shoulders of the data producer to produce them, and on the consumer to decide how to display them.

Maybe some guidelines will be useful down the road if we see inconsistent behavior.

LeoFrachet · 2019-08-15T14:15:00Z

I'm opening the vote on this proposal. Vote will be open until next Thursday 22nd, 23:59:59 UTC.

flocsy · 2019-08-15T14:43:22Z

I still would like to change the following sentence to make it clearer:

If the dataset contains values in multiple languages (e.g. in multilingual countries like Switzerland, Belgium or Canada), the norm ISO 639-2 contains the language code “mul” to describe such reality. In such case, the best practice is to provide a translation for each of the languages used in the dataset.

It's unclear IMHO what "dataset contains values in multiple languages" means. In my reading this means that there are more than one languages in the DATASET. However if the default is "en" and I provide a translation to "fr", then no need for "mul".

I would suggest something like:

If the default values in the dataset contain values in multiple languages (e.g. in multilingual countries like Switzerland, Belgium or Canada in stops.txt you have more than one language), the norm ISO 639-2 contains the language code “mul” to describe such reality. In such case, the best practice is to provide a translation for each of the languages used in the dataset. If all the labels in stops.txt are in one language, and there are translations in translations.txt, then "mul" is not to be use.

I'm sure the English speakers can improve it even further, I'd like it to be as explicit as possible.

LeoFrachet · 2019-08-15T15:13:33Z

Thanks @flocsy for the suggested language. I'm adding a slightly altered version of your proposal:

If the untranslated values in the dataset are in multiple languages (e.g. in multilingual countries like Switzerland, Belgium or Canada the stop_name in stops.txt will be by default in different languages depending of the area), the feed_lang field should contain the language code mul defined by the norm ISO 639-2 to describe such situation. In such case, the best practice is to provide a translation for each of the languages used in the dataset. If all the untranslated values in the dataset are in the same language, then "mul" should not to be use.

LeoFrachet · 2019-08-15T15:16:37Z

Since nobody voted since I opened the vote, and since we changed the phrasing, I'm closing and reopening the vote.

Vote will be open until Thursday 22nd, 23:59:59 UTC.

aababilov · 2019-08-15T21:55:16Z

+1 from Google.

flocsy · 2019-08-15T22:19:59Z

+1

abyrd · 2019-08-16T06:54:02Z

It took me a while to understand the text "If the untranslated values in the dataset are in multiple languages (e.g. in multilingual countries like Switzerland, Belgium or Canada the stop_name in stops.txt will be by default in different languages depending of the area)". I think it will not be immediately apparent to many readers what this means.

The expressions "untranslated values are in multiple languages" and "depending on the area" are ambiguous. At first I thought this was describing some kind of system that reacted to the location of the reader or consumer and extracted language-specific sub-values out of multi-lingual individual records.

Here is my attempt at a rewrite (also correcting some small errors with prepositions etc.):

Datasets may contain untranslated values in multiple languages. For example, in a multilingual country like Switzerland, Belgium, or Canada the stop_name field of each stop could be in a different language, depending on the dominant language in that stop's geographic location. In such cases, the feed_lang field should contain the language code mul defined by the norm ISO 639-2. The best practice here is to provide a translation for each of the languages used in the dataset. If all the untranslated values in the dataset are in the same language, then "mul" should not be used.

Though the comments mention putting off the stop_name="Biel/Bienne" case for the future, the proposal in its current form describes covers both known use cases of the mul language code (single language per record, multiple languages within a single field) They both seem like valid interpretations of mul to me so I'm happy to see both added to the spec.

flocsy · 2019-12-02T23:56:02Z

So is it up for vote again?
+1 (Moovit)

aababilov · 2019-12-02T23:58:06Z

+1 (Google)

prhod · 2019-12-05T05:00:26Z

+1 (Kisio)

timMillet · 2019-12-09T21:01:14Z

@flocsy and @aababilov
My bad, the vote is not up again. @prhod told me that I did a wrong redirection from my post on the GTFS Google Group about the vote for GTFS-Attributions. I am very sorry about that.

timMillet · 2019-12-09T21:21:55Z

@flocsy and @abyrd
I worked on an improved proposition for the extension of the feed_info.feed_lang field description. The goals were:

making something similar to the way definitions are formatted within the GTFS reference guide: general description first, then example in italic (e.g. descriptions of calendar_dates.exception_type or stop_times.stop_dist_traveled).
making the use case of multilingual datasets clearer to understand, according to all your comments, both in the general description and in the example section.

Below would be the whole field description:

Default language used for the text in this dataset. This setting helps GTFS consumers choose capitalization rules and other language-specific settings for the dataset. The file translations.txt can be used if the text needs to be translated into languages other than the default one.

The default language may be multilingual for datasets with the original text in multiple languages. In such cases, the feed_lang field should contain the language code mul defined by the norm ISO 639-2. The best practice here would be to provide, in translations.txt, a translation for each language used throughout the dataset. If all the original text in the dataset is in the same language, then mul should not be used.

Example: Consider a dataset from a multilingual country like Switzerland, with the original stops.stop_name field populated with stop names in different languages. Each stop name is written according to the dominant language in that stop’s geographic location, e.g. Genève for the French-speaking city of Geneva, Zürich for the German-speaking city of Zurich, and Biel/Bienne for the bilingual city of Biel/Bienne. The dataset feed_lang should be mul and translations would be provided in translations.txt, in German: Genf, Zürich, and Biel; in French: Genève, Zurich, and Bienne; in Italian: Ginevra, Zurigo, and Bienna; and in English: Geneva, Zurich, and Biel/Bienne.

Please, don’t hesitate to provide any feedback!

flocsy · 2019-12-10T16:27:07Z

this is clear to me!

timMillet · 2019-12-16T19:15:09Z

Since both a producer and a consumer have implemented translations.txt as put forward by this pull request, and a consensus has been reached on the feed_info.feed_lang description, I am re-opening the vote.

Producer: EasyWay with Lviv’s dataset
Consumer: Google

The vote will be open until Monday, December 23rd at 23:59:59 UTC.
Gavriel (@flocsy ), Alexej (@aababilov ), Pascal (@prhod ): don’t hesitate to vote if you want to.

skinkie · 2019-12-16T19:21:36Z

+1 Stichting OpenGeo / Bliksem Labs

aababilov · 2019-12-16T22:18:11Z

+1 from Google.

gcamp · 2019-12-16T22:28:09Z

+1 from Transit

prhod · 2019-12-17T05:35:22Z

+1 from Kisio

nighthawk · 2019-12-17T06:17:58Z

+1 from SkedGo

flocsy · 2019-12-17T07:33:20Z

+1 (Moovit)

tsherlockcraig · 2019-12-17T16:41:05Z

+1 from Trillium

LeoFrachet · 2020-01-09T14:20:26Z

The vote is closed.

We have 6 votes in favor. Zero against.

We have a producer and a consumer.

So the proposal is adopted 🎉 !

Léo Frachet added 3 commits August 7, 2019 16:54

GTFS-Translations (without record_sub_id and field_value)

6fdfe80

Add fields record_sub_id and field_value

310f424

fix typos

90d6c87

googlebot added the cla: yes label Aug 8, 2019

LeoFrachet mentioned this pull request Aug 8, 2019

GTFS-Translations #175

Closed

Adding pathways and levels.

3ea014c

Improve "mul" explaination

d10a5d6

come25136 mentioned this pull request Oct 20, 2019

新しいGTFS-Translationsフォーマットの対応 come25136/gtfs#1

Closed

Change feed_info.feed_lang definition

7887469

LeoFrachet merged commit bc3d042 into google:master Jan 9, 2020

LeoFrachet deleted the extension/gtfs-translations branch January 9, 2020 14:20

karimhm mentioned this pull request May 11, 2020

Supporting translation ibi-group/datatools-ui#576

Open

scmcca added proposal GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule labels May 20, 2022

This was referenced May 26, 2022

translations.txt: translation should not be a primary key #326

Merged

translations.txt: specify which translation takes precedence over another #327

Merged

Sergiodero mentioned this pull request Mar 1, 2024

Clarification on language code data standards used in translations.txt #435

Open

isabelle-dr mentioned this pull request May 22, 2024

Update requirement for feed_info.txt #460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GTFS-Translations #180

GTFS-Translations #180

LeoFrachet commented Aug 8, 2019

aababilov commented Aug 8, 2019

aababilov commented Aug 8, 2019

flocsy commented Aug 8, 2019

LeoFrachet commented Aug 8, 2019

flocsy commented Aug 8, 2019 via email

LeoFrachet commented Aug 12, 2019

LeoFrachet commented Aug 15, 2019

flocsy commented Aug 15, 2019

LeoFrachet commented Aug 15, 2019 •

edited

Loading

LeoFrachet commented Aug 15, 2019

aababilov commented Aug 15, 2019

flocsy commented Aug 15, 2019

abyrd commented Aug 16, 2019 •

edited

Loading

flocsy commented Dec 2, 2019

aababilov commented Dec 2, 2019

prhod commented Dec 5, 2019

timMillet commented Dec 9, 2019

timMillet commented Dec 9, 2019 •

edited

Loading

flocsy commented Dec 10, 2019

timMillet commented Dec 16, 2019 •

edited

Loading

skinkie commented Dec 16, 2019 •

edited

Loading

aababilov commented Dec 16, 2019

gcamp commented Dec 16, 2019

prhod commented Dec 17, 2019

nighthawk commented Dec 17, 2019

flocsy commented Dec 17, 2019

tsherlockcraig commented Dec 17, 2019

LeoFrachet commented Jan 9, 2020

GTFS-Translations #180

GTFS-Translations #180

Conversation

LeoFrachet commented Aug 8, 2019

aababilov commented Aug 8, 2019

aababilov commented Aug 8, 2019

flocsy commented Aug 8, 2019

LeoFrachet commented Aug 8, 2019

flocsy commented Aug 8, 2019 via email

LeoFrachet commented Aug 12, 2019

LeoFrachet commented Aug 15, 2019

flocsy commented Aug 15, 2019

LeoFrachet commented Aug 15, 2019 • edited Loading

LeoFrachet commented Aug 15, 2019

aababilov commented Aug 15, 2019

flocsy commented Aug 15, 2019

abyrd commented Aug 16, 2019 • edited Loading

flocsy commented Dec 2, 2019

aababilov commented Dec 2, 2019

prhod commented Dec 5, 2019

timMillet commented Dec 9, 2019

timMillet commented Dec 9, 2019 • edited Loading

flocsy commented Dec 10, 2019

timMillet commented Dec 16, 2019 • edited Loading

skinkie commented Dec 16, 2019 • edited Loading

aababilov commented Dec 16, 2019

gcamp commented Dec 16, 2019

prhod commented Dec 17, 2019

nighthawk commented Dec 17, 2019

flocsy commented Dec 17, 2019

tsherlockcraig commented Dec 17, 2019

LeoFrachet commented Jan 9, 2020

LeoFrachet commented Aug 15, 2019 •

edited

Loading

abyrd commented Aug 16, 2019 •

edited

Loading

timMillet commented Dec 9, 2019 •

edited

Loading

timMillet commented Dec 16, 2019 •

edited

Loading

skinkie commented Dec 16, 2019 •

edited

Loading