{translated} is a complex internationalization system made easy. Provide a directory where localization JSON files are stored and access entries from any place in your code. Various features are incorporated: recursive string interpolation, custom plural form rules, entry grouping, inheriting localization data from other dialects (see differences between British and American English for a canonical example), and more.
# Install from CRAN:
install.packages("translated")
# Install development version from GitHub:
# install.packages("devtools")
devtools::install_github("ttscience/translated")
To start using {translated} in your project, specify where to find localization JSON files:
library(translated)
# Below is the path to examples shipped with this package
path <- system.file("examples", package = "translated")
trans_path(path)
Processing localization data is done behind the curtains. The user can focus on using the entries:
trans("title")
#> [1] "Predefined number generator"
This was the localization for the default locale, i.e. en_US
(United
States English). The currently set locale can be checked with:
trans_locale()
#> [1] "en_US"
To set a different locale (e.g. pl_PL
or pl
for short, my native
language), simply pass it as an argument to the same function. Both
mentioned forms are acceptable (plus the form with encoding included,
for example pl_PL.UTF-8
), so we’ll use the simpler one:
trans_locale("pl")
The localization immediately changes:
trans("title")
#> [1] "Generator liczb predefiniowanych"
To list all currently available locales, call:
trans_available()
#> [1] "en_UK" "en_US" "pl"
If more than one key is supplied, an equal amount of translations is returned. It makes it easier to translate vectors of strings, e.g. column names. Works with all other features, too.
trans(c("btn_close", "title"))
#> [1] "Zamknij" "Generator liczb predefiniowanych"
Some entries can have gaps to fill with variable values. They are
denoted with braces {}
inside translation text,
e.g. "Courtney is {age} years old."
. Pass these variables as named
parameters to trans()
function (and don’t worry, unused parameters are
ignored). Most often they’ll be strings, but anything coercible to
string is valid, especially numbers:
# JSON entry
"btn_insert": "Wstaw {number}"
trans("btn_insert", number = 4)
#> [1] "Wstaw 4"
It is up to the user to provide singular and plural forms for each entry
where it’s necessary. {translated} provides the user with an easy method
to make it work: all forms for an entry are stored in a list. With an
appropriate number-to-form converter, the user only has to supply the
number as .n
parameter.
To see details on defining rules of plural forms, see “Plurality rule definition” section.
# JSON entry
"cat": ["brak kotów", "{.n} kot", "{.n} koty", "{.n} kotów"]
trans("cat", .n = 5)
#> [1] "5 kotów"
trans("cat", .n = 1)
#> [1] "1 kot"
It can be difficult to keep track of all the entries in a large JSON file, so a grouping system comes in handy. Say, for example, that you’d created a large Shiny app with multiple modules and now you’d want it internationalized. You’d group your localization entries by module so that you find entries quicker and don’t have to worry about name clashes.
To access an entry within a group, use a "group.key"
string with dots
dividing parts of the path. There is no limit to how deep the grouping
may go, so the key may as well look like "group1.group2.group3.key"
.
And remember – never use a key with a dot, as the interpreter cannot distinguish between the two. But the easy grouping functionality is worth this slight inconvenience.
# JSON entry
"nouns": {
"behavior": "zachowanie"
}
trans("nouns.behavior")
#> [1] "zachowanie"
As it turns out, {glue} package
offers much more flexibility than just inserting predefined variables –
it can execute arbitrary code too. This isn’t too helpful on its own;
you may as well pass the result of this code as a named parameter to
trans()
function. However, this has an interesting effect regarding
nested translations.
See, it’s a common problem that a phrase may contain more than one noun
dependent on its count. You may try to cover all possible cases, but
their number grows exponentially. What you can do instead is to split
the processing logic between multiple entries, each having one
count-dependent part at most, then compound these entries using
trans()
function inside another entry. See the example below; however,
note that this is just one of the possible solutions, perhaps not even
optimal.
# JSON entry
"result": "Przeskanowałam {trans('file', .n = n_files)} w {trans('dir', .n = n_dirs)}.",
"file": ["{.n} plików", "{.n} plik", "{.n} pliki", "{.n} plików"],
"dir": ["żadnym folderze", "{.n} folderze", "{.n} folderach", "{.n} folderach"]
trans("result", n_files = 4, n_dirs = 1)
#> [1] "Przeskanowałam 4 pliki w 1 folderze."
If you have an idea for a feature that is missing from {translated}, please start an issue on our GitHub repository. Pull requests are obviously welcome as well.
Localization is to be stored in JSON files, all inside one folder. Each JSON may only hold data for one language 1, yet more than one JSON may be used for a language 2. Files do not have to be in the same folder, subdirectories are allowed (so the user may store all data for a language in its own folder, for example).
Each JSON identifies its belonging by a required component: locale
stored under config
. The other required component is translation
map, although there is no limit to how many entries there must be.
Furthermore, each locale should contain plurality case assignment as
plural
stored under config
, although only one file per locale must
contain it.
There are two optional config
fields and both do not need to be
repeated as well: inherit
and default
. The former enables inheriting
translation and plurality data from other locales, making it easier to
translate dialectal differences. The latter sets the locale as the
default for its language (e.g. in the example below, American English is
used as the default version of English).
To sum it up, an example JSON structure is shown below:
{
"config": {
"locale": "en_US",
"plural": "n == 0 ~ 1, n == 1 ~ 2, TRUE ~ 3",
"inherit": "en_UK",
"default": true
},
"translation": {
"key": "value",
"plural_key": ["case_1", "case_2", "case_3"],
"group": {
"key2": "value2"
}
}
}
A plurality rule consists of a set of sequentially-evaluated cases. They
are separated by commas (,
). Each rule has two components: condition
and value. Should the condition evaluate to TRUE
, its value is
returned, else the next rule is tested.
Values are placed to the right of their conditions, separated by a tilde
(~
), meaning that conditions should not use tilde in their code. This
should not be a problem, however, as only a few simple operations should
suffice to create an appropriate set of rules for any language.
A condition has access to one variable: n
. This is the count that
influences the phrase form. After evaluation, a condition should return
a single logical value, either TRUE
or FALSE
.
Now that we’ve discussed rule structure, let’s break down two examples, starting with the one from above JSON:
"n == 0 ~ 1, n == 1 ~ 2, TRUE ~ 3"
There are three cases in this example. The first case matches if n
is
equal to 0, returning 1. The second matches with 1 (for a singular
form), returning 2. The last case is a “catch-all”, returning 3 for all
inputs that did not match previous cases (i.e. any number other than 0
or 1).
Analogous rule for Polish is much more complicated:
"n == 0 ~ 1, n == 1 ~ 2, n %% 100 %in% 12:14 ~ 4, n %% 10 %in% 2:4 ~ 3, TRUE ~ 4"
Oof, this is more than double the length of English rule. But it’s not that difficult in its essence.
First two cases are already known to you, they match 0s and 1s. There’s
also a “catch-all” at the end, returning 4. And the part inbetween? It
returns 3 for all numbers returning 2, 3 or 4 modulo 10 except 3
numbers that return 12, 13 or 14 modulo 100. That modulo operator %%
and inclusion operator %is%
are the other two key operations.
Of course, any logical operator could be applied as well, so instead of writing
n %% 100 %in% 12:14 ~ 4, n %% 10 %in% 2:4 ~ 3
,
one could write
(n %% 10 %in% 2:4) && !(n %% 100 %in% 12:14) ~ 3
.
However, it is best to stick to the simplest statements possible.
{translated} is meant to be lightweight, so only the key packages are imported:
Footnotes
-
Storing data in separate files for each language is in line with best practices. Usually there’ll be a different translator assigned for each language and having the data split between files gets rid of the need for merging files. ↩
-
Especially valuable in huge projects with a lot of text, as it enables grouping data in files by topic or module. ↩
-
“Except” means that the following case should precede the previous one. ↩