Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LazyLoadable Backend #612

Merged
merged 4 commits into from
Feb 8, 2022
Merged

Conversation

paarthmadan
Copy link
Contributor

@paarthmadan paarthmadan commented Feb 3, 2022

What's in this PR

This PR introduces a new LazyLoadable backend following the proposal written in #592.

What does the LazyLoadable Backend offer?

This backend offers a performance optimization for environments where only a fraction of the app's translation data is actually required. Most notably, a local test environment.

As opposed to the Simple backend, this backend avoids loading all translations in the load path. Instead, it infers which files need to be loaded based on the current locale. To do so, it imposes a format on the files in the load path. They must abide by a specific format structure to enable the backend to reason about which files belong to which locale. We trade off the rigidity of the imposed format with the performance incentive achieved by only loading files that are needed.

In other words, this backend avoids the cost of loading unnecessary translation files by carefully selecting only those files which are needed for the current locale. It lazily initializes translations on a per locale basis.

How does the LazyLoadable Backend work?

This backend trades off the expensive cost of I/O with the cost of perform string matching on files in the load path. It makes assumptions about which files belong to a locale and selectively loads only these files.

How does the LazyLoadable Backend know which files belong to which locale?

It makes assumptions about how files are named. Clients must abide by this naming system if they decide to use this backend.

The heuristic used to bind a file to its locale can be defined as follows:

  1. the filename is in the I18n load path
  2. the filename ends in a supported extension (ie. .yml, .json, .po, .rb)
  3. the filename starts with the locale identifier
  4. the locale identifier and optional proceeding text is separated by an underscore, ie. "_".

Working Through An Example

Assume an app's I18n.load_path consisted of the following files:

config/locales/en_001.yml
config/locales/en_002.yml
config/locales/en_003.yml
...
config/locales/en_n.yml
config/locales/fr_001.yml
config/locales/fr_002.yml
config/locales/fr_003.yml
...
config/locales/fr_n.yml
config/locales/de_001.yml
config/locales/de_002.yml
config/locales/de_003.yml
...
config/locales/de_n.yml

A test is run in the local environment which requires a single :en translation. Currently, when the Simple backend is initialized, all files will be loaded into memory.

This results in 3n loads if we assume there are only 3 locales.

With the LazyLoadable backend, we can conventionally select only the :en translations resulting in n loads.

When should someone use this backend?

The backend has two working modes: lazy_load and eager_load.

This backend should only be enabled in test environments!

When the mode is set to false, the backend behaves exactly like the Simple backend, with an additional check that the paths being loaded abide by the format. If paths can't be matched to the format, an error is raised.

It's particularly useful to enable for workloads that operate in the context of a single locale at a time and have many translations files for many locales. For instance, a large Rails workload would benefit from this backend in the local test environment.

Benchmarks: Comparing the Simple backend to the LazyLoadable backend

A benchmark setup was used to compare the performance of these two backends.

Table 1: Setup with 10 files per locale, 100 keys in each file:

Backend Work Performed User Sys Total Real
Simple Eager load (:en) 0.012764 0.000721 0.013485 0.013503
Simple 3 Eager loads (:en, :fr, :de) 0.012364 0.000675 0.013039 0.013038
LazyLoadable Eager load (:en) 0.004820 0.000330 0.005150 0.005137
LazyLoadable 3 Eager loads (:en, :fr, :de) 0.019816 0.000847 0.020663 0.020674

Table 2: Setup with 100 files per locale, 1000 keys in each file:

Backend Work Performed User Sys Total Real
Simple Eager load (:en) 1.342190 0.020641 1.362831 1.363569
Simple 3 Eager loads (:en, :fr, :de) 1.344860 0.018035 1.362895 1.363284
LazyLoadable Eager load (:en) 0.478600 0.011205 0.489805 0.489951
LazyLoadable 3 Eager loads (:en, :fr, :de) 1.357584 0.026064 1.383648 1.384148

Evaluating the results

The LazyLoadable backend reduces working time as it avoids loading unnecessary files. In the case when loading for a single locale, we see that the LazyLoadable backend outperforms Simple, 0.005 vs 0.013 in Table 1 and 0.4899 vs 1.363 in Table 2.

This time reduction is a function of the number of locales, so we see 3x improvements because we avoid loading 66% of the files. This scales with the number of files avoided.

Note: The LazyLoadable backend performs roughly on-par with the Simple backend when it needs to load all translations. There is additional overhead of string matching which brings down the performance in small workloads. It's negligible in any significant workloads compared to the time spent in I/O.

Industry Proof

At Shopify, we've patched ruby-i18n locally to implement a similar strategy. We've observed close to 10x speed ups locally in specific tests and roughly 20% speeds across the suite.

Conclusions

This backend is designed to bring performance improvements to workloads with a large volume of locales, translation files, and translation keys.

It's designed for the local test environment, and is an opt-in backend.

This makes introspecting the translations loaded by a file easier. Maintains backwards compatability as the block is optional.
@@ -0,0 +1,65 @@
require 'test_helper'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is a test used to produce the benchmarks. It will be removed before merging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go ahead and remove this now -- I've reviewed this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radar Done 👍

@paarthmadan paarthmadan marked this pull request as ready for review February 3, 2022 01:21
Copy link
Collaborator

@radar radar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a lot of great work. Thank you very much for your dedication here, @paarthmadan.

@@ -0,0 +1,65 @@
require 'test_helper'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go ahead and remove this now -- I've reviewed this PR.

@radar radar merged commit fb7095a into ruby-i18n:master Feb 8, 2022
@radar radar added this to the 1.10 milestone Feb 8, 2022
@radar
Copy link
Collaborator

radar commented Feb 8, 2022

I'll release this with a fix for #606 as the 1.10 release, ideally by next Monday.

@adrianna-chang-shopify adrianna-chang-shopify deleted the pm/lazy-loadable-backend branch February 11, 2022 20:08
@adrianna-chang-shopify adrianna-chang-shopify restored the pm/lazy-loadable-backend branch February 11, 2022 20:08
@paarthmadan paarthmadan deleted the pm/lazy-loadable-backend branch February 18, 2022 16:43
@salochara
Copy link

Hello! @paarthmadan 👋
I hope you're doing great.

I'm working on improving performance for the faker gem.
We're evaluating the option of enabling this LazyLoadable backend.
It looks like a pretty awesome improvement 🎉, as shown in the Industry proof you kindly shared.

I just have a question regarding this implementation...
This backend should only be enabled in test environments!
What's the reasoning behind this? 🤔

All the best!
Salomón.

@casperisfine
Copy link

This backend should only be enabled in test environments!
What's the reasoning behind this?

In production you'd rather load all the translation as part of boot so:

  • The first user to need an translation isn't slowed down by the extra read + parsing (can be very slow on large files)
  • Assuming your app uses a forking server (unicorn, puma, etc), the data will mostly be in memory pages that are automatically shared by Copy on Write, reducing the memory usage.

@paarthmadan
Copy link
Contributor Author

Hey @salochara, I'd echo all that @casperisfine shared and add in addition that:

The test environment, in particular, is the perfect candidate for lazy loading translations because:

  1. We expect the test environment to be started and stopped frequently
  2. Certain tests don't require any translations
  3. Tests that do require translations typically require a small subset of the entire pool

These factors together benefit from lazy loading because we drastically reduce startup time, we only ever load translations that we need, and we only incur this penalty for tests that do actually require translations.

Jean provided a great argument for why this shouldn't be used in production, but these are added reasons for why it makes added sense in the test env.

@salochara
Copy link

Hi! @paarthmadan @casperisfine 👋
Thank you very much for your responses. I really appreciate it.

Got it. Makes sense.
Now it's very clear why this is intended for the test environment.

Again, thank you very much for your response and the work you guys kindly put out for the community.

All the best! 🙏🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants