Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the immediate release of translated chapters #671

Closed
7 of 9 tasks
rviscomi opened this issue Feb 21, 2020 · 14 comments
Closed
7 of 9 tasks

Support the immediate release of translated chapters #671

rviscomi opened this issue Feb 21, 2020 · 14 comments
Labels
development Building the Almanac tech stack translation world wide web

Comments

@rviscomi
Copy link
Member

rviscomi commented Feb 21, 2020

We've made amazing progress translating the contents of the Almanac into multiple languages. Unfortunately, none of this work is publicly available because the translations are being batched up to be released when the entire contents are ready.

What needs to happen for us to start releasing translated chapters as soon as they're completed?

cc @HTTPArchive/developers @HTTPArchive/translators


Edit by @bazzadp , listing tasks here:

  • Add ability to mark pages (chapters or the non-chapter pages) as available in other languages and if so which languages. Maybe in a similar manner as to how we did the "unedited" note on each chapter. That was originally done as "todo" in the JSON config, but then we moved to unedited in the chapter meta-data but think this would be better stored in the JSON config as otherwise would need to store each variation that's available multiple times in each chapter.
  • Update the translation drop down (currently hidden) to only display when translation exists (as specified in previous point).
  • Remove top "Start Exploring" link as discussed in Optimize website for wide viewports #574 (comment) as not enough space for it once we add the Language switcher (and some of us think it's just wrong anyway 😀)
  • Review Language switcher in all screen sizes and browsers to ensure it is in correct place as not really been tested much and templates have changed a fair bit since it was done.
    Add missing templates for the other Languages were we have translations. Only the French translation has all the templates needed to be able to generate the HTML for the chapters so only it can launch and the Spanish and Japanese versions can't. To be honest I'd prefer to work on Refactor templates to reduce duplication and assist with Translations #673 first to hopefully make this easier for the Spanish and Japanese translators.
  • Fix Chapter slugs and image directories based on title so don't work for Translations #654 before we can launch some of the French chapters.
  • Decide what we are doing with i18n - Reimplement Almanac logo using text and CSS instead of SVG #567 - maybe live with it as is for initial launch?
  • Fix Migrate translated chapters to new Data Visualisation Formats #542 and Spanish JavaScript chapter: Update note on frameworks section #625 before we can launch the Spanish chapters.
  • I gather visualisations (How to handle chapter visualisations in translations? #558) are out of scope for now and we circle back to them once the text has been translated if there's still volunteers to tackle that?
  • Redirect any missing translations back to English so any references to them in translated chapters don't 404. This should be a 302 (temporary redirect) instead of a 301 (permanent redirect) for SEO purposes. We should also check the cache time of this.
@rviscomi rviscomi added development Building the Almanac tech stack translation world wide web labels Feb 21, 2020
@tunetheweb
Copy link
Member

tunetheweb commented Feb 22, 2020

Yeah we definitely should look to do this. It's a shame to hold back all the good work fo the translations rather than release them as we can (as Wikipedia and MDN do!). Will also hopefully encourage more translations too as it's a lot of work, and I'd imagine not very encouraging when you won't see the fruits of your labour until the end.

Some things I think we'd need to do to allow this to happen:

Thats all I can think off now and, if you agree then we should probably add them as check boxes to first comment so we can tick them off as we do them.

@rviscomi
Copy link
Member Author

rviscomi commented Feb 27, 2020

Add ability to mark pages (chapters or the non-chapter pages) as available in other languages and if so which languages. Maybe in a similar manner as to how we did the "unedited" note on each chapter. That was originally done as "todo" in the JSON config, but then we moved to unedited in the chapter meta-data but think this would be better stored in the JSON config as otherwise would need to store each variation that's available multiple times in each chapter.

Let's say only the JS chapter is translated to a particular language. A user lands on the translated JS chapter and tries to continue on to the CSS chapter. If the CSS chapter is only available in English, does the link take them to <lang>/2019/css or en/2019/css? It's not great but I think the first option works. So for a language to be supported, it doesn't necessarily need all content translated it just needs to have all of the templates/chapters copied over to its <lang> directory. This might actually be good for the translators because the PRs would show a diff of English to the translated version. It also simplifies our routing logic so once you land in <lang> you stay there until you explicitly change languages. Not sure what it does to SEO. This could be a good use of a flag similar to the "unedited" one, where we mark a chapter as not translated yet. That could trigger a little message on the page to help us translate it on GitHub.

Update the translation drop down (currently hidden) to only display when translation exists (as specified in previous point).

If we go the route described above, for any language that has some translations done, we would add it to the list of available languages in the dropdown.

Remove top "Start Exploring" link as discussed in #574 (comment) as not enough space for it once we add the Language switcher (and some of us think it's just wrong anyway 😀)

SGTM

Review Language switcher in all screen sizes and browsers to ensure it is in correct place as not really been tested much and templates have changed a fair bit since it was done.

+1

Add missing templates for the other Languages were we have translations. Only the French translation has all the templates needed to be able to generate the HTML for the chapters so only it can launch and the Spanish and Japanese versions can't. To be honest I'd prefer to work on #673 first to hopefully make this easier for the Spanish and Japanese translators.

I don't think #673 is blocking. Let's copy the English templates over so we can start serving some translated content sooner.

Fix #654 before we can launch some of the French chapters.

+1 we need to know what we're doing with the URL slugs. I think it's ok to keep the English naming in the URL? WDYT?

Decide what we are doing with #567 - maybe live with it as is for initial launch?

I think it's ok to leave it in English. If absolutely needed we could regenerate the logo in HTML/CSS or make new images for each language. Not blocking.

Fix #542 and #625 before we can launch the Spanish chapters.

SGTM

I gather visualisations (#558) are out of scope for now and we circle back to them once the text has been translated if there's still volunteers to tackle that?

SGTM

Redirect any missing translations back to English so any references to them in translated chapters don't 404. This should be a 302 (temporary redirect) instead of a 301 (permanent redirect) for SEO purposes. We should also check the cache time of this.

Maybe it's ok to serve the English version from the <lang> URL? If we show a translated note on the page that "this chapter isn't translated yet. if you can help, join the translation effort on github..." maybe that's ok. And if we're worried about SEO I think we could explicitly set noindex on untranslated pages.

@tunetheweb
Copy link
Member

tunetheweb commented Feb 27, 2020

Let's say only the JS chapter is translated to a particular language. A user lands on the translated JS chapter and tries to continue on to the CSS chapter. If the CSS chapter is only available in English, does the link take them to <lang>/2019/css or en/2019/css?

Maybe it's ok to serve the English version from the <lang> URL? If we show a translated note on the page that "this chapter isn't translated yet. if you can help, join the translation effort on github..." maybe that's ok. And if we're worried about SEO I think we could explicitly set noindex on untranslated pages.

I'm less of a fan of including the english content under <lang>/2019/css with a "translation required" flag/notice to be honest. I can see why the advantages (encourages translation, keeps them in lang), but I've just never seen that done before in other sites to be honest so seems a little weird to me. And for some languages they could be up a long time.

Definitely would need a noindex to avoid SEO issues if we did go down this route.

But also what language to say in <html> tag? <html lang="fr"> when content in English is just wrong and <html lang="en"> for fr URL seems weird (though more correct based on content). Just one of many things to consider.

Plus we're false advertising on other pages (you get sent English link but notice the drop down and think we have this page available in your local language - cool! click. dissappointment 😢). Unless you're saying we still only show the translated languages in the drop downs if an actual translation exists and the <lang>/2019/css is only given if explicitly navigated to (e.g. from <lang>/2019/javascript)? But then we're not making the fact we have some chapters in that language as explicit as you could so going against point of having English ones?

All in all I just think it's more pain than it's worth and very unusual to have English placeholders. If it was an ultra short term thing (like the unedited) I could be persuaded but don't think it will be for many languages.

I don't think #673 is blocking. Let's copy the English templates over so we can start serving some translated content sooner.

Mmmm... I'd really like to get this out to be honest. Risk of someone updating an English Language template and forgetting one or more other languages is quite high IMHO (some of them are already our of date) and impact would be higher after launching a language (as we could breaking a production page if we forgot to update a non-English language template) but possibly more difficult to see (can we really hand on heart say we'd test all languages for every release, and even if we did, spot all problems?).

we need to know what we're doing with the URL slugs. I think it's ok to keep the English naming in the URL? WDYT?

Absolutely OK in my book and very common (see MDN or Wikipedia - though Wikipedia do also sometimes translate URLs (for example). I think we could add redirects for languages for convience but have the canonical in English even for other languages.

On that point do we see many 404s that we should add other redirects for? Could you permission me to Google Search Console so I can have a look and maybe keep an eye on this?

@tunetheweb
Copy link
Member

Here’s a, perhaps crazy, idea: what about waiting until we have a minimum number of chapters translated (E.g. 5 of them) and then machine translating the rest of them (using Chrome/Google Translate) in the short term to allow launching that language?

Could then add a “Machine translated - please get in touch if you want to help translate this properly” message.

I think that would be better than showing the English version under the translated URL.

@borisschapira
Copy link
Contributor

Hi! There's two things, here: what's best and what's quicker and more convenient to set up.

I think that adding templates should be the first step for every translation project, and translation teams should be encouraged to do so. I would have added a template to localize a message saying

This page is not translated yet. If you would like to participate in its translation into {lang}, feel free to <a href="https://github.com/HTTPArchive/almanac.httparchive.org/issues/{issue}" title="Help translating this page into {lang}">join the discussion</a>."

Then, we can add the chapter in English to every translation folder with a Front-Matter param containing the Id of the the translation issue. If the param is set, the abovementioned piece of text is displayed, the html tag displays an "en" lang attribute, and the page is not indexed. If the param is not set, the page is fully translated.

In the language selector, indeed, we can add an "add a translation" option, then redirect people to… this research?

I think these changes would be pretty fast to implement, while being satisfactory enough.

The only negative side of this approach is that the URLs would not be located. I do not realize the importance of the problem, but I think that, at worst, if we later realize that it is inconvenient, we will be able to implement a fix without invalidating the work previously done.

@tunetheweb
Copy link
Member

tunetheweb commented Mar 2, 2020

I think that adding templates should be the first step for every translation project, and translation teams should be encouraged to do so.

I kind of agree. However translating the templates did require a fair bit of knowledge of the build system and which parts of the template to translate and not. We also didn't want to put people off translating chapters by demanding this knowldge.

Also it meant an awful lot of keeping the templates up to date. For example the Arabian chaper has a template (to solve the right-to-left issue) but then didn't make any progress on that language (yet) so we've just given ourselves work to maintain that template for no real reason yet. Which, as someone who's done quite a bit of work on large refactoring PRs for accessiblity and the like proved quite painful. Also meant this template was first alphabetically so comments were put on that for PR reviews. Perhaps that Arabic template should be removed again for now but then might forget how to do RTL in the templates.

Once #676 lands (assuming we're happy with this) then this all becomes a lot easier and think we can insist on that as part of each language and as part of the first PR for any new language added. This is why I really think we should land that before progressing further on this. Have a look at the French base file and the French home page file to see how that is set up in that branch. And on that note any additional reviews of #676 would be much appreciate! 😉

I would have added a template to localize a message saying

This page is not translated yet. If you would like to participate in its translation into {lang}, feel free to <a href="https://github.com/HTTPArchive/almanac.httparchive.org/issues/{issue}" title="Help translating this page into {lang}">join the discussion</a>."

Then, we can add the chapter in English to every translation folder with a Front-Matter param containing the Id of the the translation issue. If the param is set, the abovementioned piece of text is displayed, the html tag displays an "en" lang attribute, and the page is not indexed. If the param is not set, the page is fully translated.

So you're going with @rviscomi 's suggestion of having it in English but with a note at the top? What do you think of the idea of machine translating the chapters in the meantime (again with a note at the top admitting it's not a real translation)? Don't think that's a good idea? As a non-English speaker would be good to hear your thoughts on this. I'm a native English speaker and luckily most content I look at is in my language but on the odd occasion I do need to look at foreign language page I find Google Translate in Chrome better than nothing and exposing this via the server woudl bring that "translation" to mobile and other non-Chrome browsers. But those of you that are non-native English speakers may really disagree with this (and again it depends on whether the reader has good enough English so would prefer the English version to a poorly translated native langauge version).

Also having English placeholders could cause problems with sitemap and xhreflang links. Guessing while we have no-index we should exclude the sitemap and xhreflang info with that. Again machine translating might at least give the "real language" (even if a poor translation) so avoid those complications and mean the language is really live for all chapters rather than half live.

I do worry about having single chapters translated for a long time. E.g. a Dutch SEO expert takes it upon themselves to translate the SEO chapter to Dutch but doesn't have the interest in translating all the other chapters. Do we only launch the Dutch SEO chapter and the other chapters aren't avaialble to view in Dutch (more complex routing logic)? Do we launch this with potentially English versions for the other 19 chapters under the nl URL for a long time (which introduces other problems mentioned above)? Do we machine translate the others so at least have a Dutch version of them even if a poor effort (and, if so, what's the effort in that)?

My personal preference would be, in order or most preferred to least preferred:

  1. Only show real translations and solve the routing problem.
  2. Machine translate remaining chapters as a placeholder.
  3. English placeholders.

In the language selector, indeed, we can add an "add a translation" option, then redirect people to… this research?

We have a Translation label btw so can use this: https://github.com/HTTPArchive/almanac.httparchive.org/labels/Translation which also avoids any false positives (assuming the label is used correctly but I'm pretty good at ensuring that).

I do like the idea of making it easy for people to direct people who want to start a new translation. However, on the flip side, it does risk adding more languages and complexity that we then don't make progress on (e.g. Arabic, Portuguese, Russian...etc.). that's fine if no code is commited (Portuguese, Russian) as just an issue, but less fine once we add code that needs to be maintained (Arabic). Though again once #676 lands the maintenance is a lot less.

The only negative side of this approach is that the URLs would not be located. I do not realize the importance of the problem, but I think that, at worst, if we later realize that it is inconvenient, we will be able to implement a fix without invalidating the work previously done.

Sorry, I'm not sure what you mean by this?

This was referenced Mar 17, 2020
@tunetheweb
Copy link
Member

Made some good progress on this in #684

One thing I haven't done that maybe should consider is checking if chapter exists and if not flagging in different way in ToC and in next/previous chapter widgets so people realise they will only get English version.

@tunetheweb
Copy link
Member

One thing I haven't done that maybe should consider is checking if chapter exists and if not flagging in different way in ToC and in next/previous chapter widgets so people realise they will only get English version.

Added this to #684 . Could probably do with some design tips but it works.

@tunetheweb
Copy link
Member

OK I think we're in a good place to release French and Spanish. I'd love to get Japanese in there in the first tranch as they have a load of chapters ready but just need two of the base templates translated for that to be possible.

Other than that you happy to release @rviscomi and any particular day or time you want to launch this?

@rviscomi
Copy link
Member Author

Thanks Barry! Launching as soon as the translations are ready SGTM, no particular day/time. Let's coordinate in Slack when things go live so I can help promote from @HTTPArchive on Twitter.

How much of a blocker are the base templates for the Japanese translations? Is it a deal-breaker to have part of the site temporarily untranslated? (That might be a nice motivator to draw in translation assistance)

@tunetheweb
Copy link
Member

How much of a blocker are the base templates for the Japanese translations? Is it a deal-breaker to have part of the site temporarily untranslated? (That might be a nice motivator to draw in translation assistance)

Well they need to exist before the chapters will render so can’t turn on Japanese without them.

We could just upload the English versions to the ja folder I suppose. They aren’t really templates anymore - just a list of content blocks that are slotted into the actual pages (see base.html and base_chapter.html to see what I mean). So things like “Index”, “Next Chapter”, “Previous Chapter”...etc. Plus all the content to create the translated header and footer. So not super critical I guess and could live with English for now. A shame though cause they really are quite short to translate and do think it would mean the whole page looks translated.

The again visualisations (#558) and Logo (#567) won’t be translated so maybe I am just being too perfectionist here and we should go with English?

@rviscomi
Copy link
Member Author

Yeah I'd default to English for anything untranslated just to get it out the door as quickly as possible. I'm sure the lack of translation will bother people just enough to do it themselves :)

That reminds me, another nice feature would be a "suggest an edit" UI that enables visitors to step directly into a GitHub edit flow.

@tunetheweb
Copy link
Member

Yeah I'd default to English for anything untranslated just to get it out the door as quickly as possible. I'm sure the lack of translation will bother people just enough to do it themselves :)

Submitted #690. Had a few of the important terms translated anyway so doesn't look too bad.

That reminds me, another nice feature would be a "suggest an edit" UI that enables visitors to step directly into a GitHub edit flow.

Kind of fits in with #518 for other info we were going to add to chapters. Though those less familiar with Git may prefer to raise Issues than go straight into Editing.

@tunetheweb
Copy link
Member

Closing this as we're live with French, Spanish and Japanese!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Building the Almanac tech stack translation world wide web
Projects
None yet
Development

No branches or pull requests

3 participants