Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarking to faker-ruby #2851

Closed
stefannibrasil opened this issue Nov 13, 2023 · 6 comments · Fixed by #2855
Closed

Add benchmarking to faker-ruby #2851

stefannibrasil opened this issue Nov 13, 2023 · 6 comments · Fixed by #2855

Comments

@stefannibrasil
Copy link
Contributor

stefannibrasil commented Nov 13, 2023

[this issue will be assigned to participants of the RubyConf2023 Hack Day]

There are a couple of issues regarding performance:

As a first step to tackling these performance issues, we would like to start measuring the performance of faker-ruby as part of the test suite by adding a benchmarking tool.

This tool or script would run as part of the test suite and measure:

  • load time for the library
  • generator execution speed
  • changing locales and measuring load time and performance

Resources:

Other gems do something similar, here are some examples:

@stefannibrasil stefannibrasil changed the title Add benchmarking for faker-ruby Add benchmarking to faker-ruby Nov 13, 2023
@stefannibrasil
Copy link
Contributor Author

assign to @salochara

@jhawthorn
Copy link

$ time ruby -e 'require "faker"; Faker::Coin.flip'
ruby -e 'require "faker"; Faker::Coin.flip'  0.47s user 0.05s system 99% cpu 0.519 total

$ vernier run -- ruby -e 'require "faker"; Faker::Coin.flip'
starting profiler with interval 500
#<Vernier::Result 0.514918 seconds, 1 threads, 927 samples, 185 unique>
written to /var/folders/0_/nnfg8lzj3t31jks4jdkccl4r0000gn/T/profile20231113-87519-a67roy.vernier.json

https://share.firefox.dev/49wviXk

salochara added a commit to salochara/faker that referenced this issue Nov 13, 2023
Add benchmark rake task for evaluating time execution. Fixes

Co-authored-by: Rubens Fernandes <[email protected]>
salochara added a commit to salochara/faker that referenced this issue Nov 13, 2023
Add benchmark rake task for evaluating time execution. Fixes

Co-authored-by: Rubens Fernandes <[email protected]>
@salochara salochara mentioned this issue Nov 13, 2023
2 tasks
@stefannibrasil
Copy link
Contributor Author

Next steps is to try to use I18n LazyLoadable: https://github.com/ruby-i18n/i18n/blob/master/lib/i18n/backend/lazy_loadable.rb to see if it makes a difference in terms of loading time and performance. Not sure if we will need to update the locale files paths.

stefannibrasil pushed a commit that referenced this issue Nov 21, 2023
* Add benchmark gem to Faker

* Fix #2851

Add benchmark rake task for evaluating time execution. Fixes

Co-authored-by: Rubens Fernandes <[email protected]>

* Commit PR suggestions by Thiago.
- Build all_methods outside the benchmark since we don't want to measure that.

* Commit PR suggestions by Steffani.

- Avoid calling the same object twice

---------

Co-authored-by: Rubens Fernandes <[email protected]>
@salochara
Copy link
Contributor

salochara commented Jan 12, 2024

Hello @stefannibrasil 👋
I hope you're doing great.

I spent some time today looking at LazyLoadable. It seems like a pretty powerful alternative for improving time performance in testing environments.

I read through the implementation and discussion in this PR ruby-i18n/i18n#612.
It turns out, LazyLoadable was implemented by Shopify devs and the industry proof they have, is that their test suite was 20% faster (as shown in the PR above).

What I'm not sure, is why they suggest this LazyLoadable backend is only meant to be for testing environments. I'm not sure what's the reasoning behind.

In any case, regarding Faker and using the LazyLoadable backend, I did some testing.
In faker.rb, I simply added the following line:
I18n.backend = I18n::Backend::LazyLoadable.new(lazy_load: true)
That's all there is really to enable LazyLoadable.

The thing is that, whenever that's enabled, when running rake a bunch of errors come up.

LazyLoadable is expecting a certain naming system for the locales, and in some cases, we don't have that in Faker.
These are: (as specified here: ruby-i18n/i18n#612)

How does the LazyLoadable Backend know which files belong to which locale?
It makes assumptions about how files are named. Clients must abide by this naming system if they decide to use this backend.

The heuristic used to bind a file to its locale can be defined as follows:

the filename is in the I18n load path
the filename ends in a supported extension (ie. .yml, .json, .po, .rb)
the filename starts with the locale identifier
the locale identifier and optional proceeding text is separated by an underscore, ie. "_".

So to move forward, I have mainly 2 issues/questions

  1. I'm not sure why LazyLoadable is only proposed for testing enviroments. If the testing enviorment is the only one we can enable it in, do you think is worth adding it to Faker?
  2. There are tons of errors that show up when enabling the LazyLoadable backend. If we do want to implement it, we would have to start by fixing those errors, and then be able to measure the benefits from having it enabled.

What are your thoughts on this?

All the best!
Salomón.

@salochara
Copy link
Contributor

Hello! @stefannibrasil

Just a quick update.
I started a discussion on this PR on the reasoning behind this feature being only for testing environment. The authors of that PR very kindly explained the reasoning (see here).

Bottom line... the lazy loading feature only makes sense in the testing environment.
With the context I provided on my previous comment.. do you think is worth adding it to our testing suite?

All the best!
Salo.

@thdaraujo
Copy link
Contributor

thdaraujo commented Jan 15, 2024

hey @salochara , thanks for the investigation!

And yeah, what they said in here makes a lot of sense.

You want translations to be available in memory so you can use them as fast as possible in production. But in your test environment, you're not going to use all the translations in your tests anyway. So adding lazy loading gives you faster boot times, which lets your tests run quickly.

I think the way Faker is used puts us in a tricky situation:

On one hand, when you're using Faker in your test environment, you want generators to be fast so your test runs quickly, and for that you do want all translations already loaded in memory.

On the other hand, you also want your tests to boot fast, so running all your specs doesn't take too long. And in that case, you don't want to load all your translations, because it takes time to read and parse them.

Also, it can be hard to know which translation file Faker will need at boot time, because this depends on the generator you're using, so lazy loading would certainly help here.

In summary, we want it all. 😆

When running faker in production/staging for generating example data, this is not a problem, because you won't be rebooting the app very often, so lazy loading is not needed here.

--

But I think there are a couple of ideas we could think about:

  1. what if we only load the default locale on boot, and only load other locales when they're needed / when a new locale is set?

We might be doing this right now, but I'm not sure, would have to see how i18n behaves. But if not, maybe this could be a potential improvement. Assuming that we only load :en locales, for example, would that make any difference?

  1. if adding lazy loading to i18n is really a requirement, could we build a custom lazy loader for i18n that understands the way Faker locales are organized?

I think we could come up with a loader that could map a given generator to its required locale and load it when necessary.

  1. can we just load locales faster?

IMO I think this is the more interesting question. Can we just make locales load very fast?

Assuming that the user won't be adding their own custom translations/locales to Faker, we can say that all possible locales are fixed when we generate a new release of Faker.

If that's true, then we could just transform all locales into something that can be loaded quickly.

Then this becomes a problem of data serialization.

One option that someone suggested was transforming our yaml locales into json, because json parsing is probably way faster.

Another option would be to turn the locales into a binary format that is easy to load into memory via marshalling. This wold then be used as a backend for all generators.

Our locales could still be yaml files, but we transform them into something that can be loaded faster and package that with the gem.

I think option 3 is worth exploring more, but I'd be interested in your opinion @salochara @stefannibrasil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants