Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix noise seed inconsistency #256

Merged
merged 22 commits into from
Jul 26, 2023
Merged

Conversation

marcobortolami
Copy link
Contributor

This PR has been created to fix the inconsistency of the noise generation pointed out in #255.

@marcobortolami
Copy link
Contributor Author

Up to now I updated simulations.py and noise.py. The only difference with respect to what discussed in #255, I think, is that I removed the default also in Simulation.add_noise in addition to the functions in noise.py. I made this choice for a better consistency among the function/method calls that generate noise: in this way, whenever generating noise, the parameter random=... must always be specified. Otherwise, only for Simulation.add_noise() there would be the possibility not to specify a RNG. Feel free to change this:)

@marcobortolami
Copy link
Contributor Author

marcobortolami commented Jul 13, 2023

Now, these things remain to be done:

  • Add the random_seed value in the automatically generated section in the report. In my opinion it's better to keep track of the seed;
  • Add a new page in the documentation dedicated to the random numbers;
  • Add a test that checks the (non) reproducibility when passing (a None) an integer random_seed;
  • Fix ALL the tests, that are failing because a breaking change has been introduced in a class used basically everywhere;
  • Fix the jupyter notebooks used as tutorial;
  • Fix ALL the examples in the documentation and the documentation itself;
  • Change the slides used as tutorial for litebird_sim (e.g. for the Tokyo hands-on meeting) and related material.

I can take care of these modifications. If you want, in the meanwhile, comment the previous commits:)

@marcobortolami
Copy link
Contributor Author

I fixed most of the tests. Only the ones taking parameters from a toml file or from a dictionary are still failing, because random_seed is a parameter that is needed in the constructor call even if it's present in the toml file or in the dictionary.
One way around is to pass it as None in the constructor call, so it does not raise an error and it gets it from the toml file/dictionary later, but this may be confusing or shady.
One better solution is giving a default value to random_seed that is "wrong", like an empty string, and check in the constructor just before init_random whether random_seed still has that value, after collecting parameters directly passed to the constructor, parameters in a dictionary or in a toml file. If this is the case, it means that random_seed has not been passed and a message saying set random_seed to None or int should appear.
What do you think?

@marcobortolami
Copy link
Contributor Author

Since the default random number generator of numpy is actually PCG64 and since SeedSequence accepts None, I uniformed and simplified sim.init_random. I did some tests and verified that:

  • Passing None the random numbers are different for two different ranks and for two different runs of the same script
  • Passing an int the random numbers are different for two different ranks but the same for two different runs of the same script

Copy link
Member

@ziotom78 ziotom78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

litebird_sim/simulations.py Outdated Show resolved Hide resolved
@marcobortolami
Copy link
Contributor Author

With the last commit I:

  • Corrected 2 small bugs (I was using random_seed instead of self.random_seed);
  • Went with the solution of a default value for random_seed that is "" and just before init_random I added a check to ensure that random_seed is not still "". This was needed as, without a default value, it was not possible to pass random_seed with a parameter file or with parameters only;
  • Fixed an additional test;

Time to pass to the examples in the tutorials and documentation:)

@marcobortolami
Copy link
Contributor Author

marcobortolami commented Jul 22, 2023

Documentation, benchmarks, tutorials and teaching material has been fixed

@marcobortolami
Copy link
Contributor Author

I modified test_simulation_random: now it also checks that by passing an integer number as seed the results are reproducible, instead by passing None they are not.

@marcobortolami
Copy link
Contributor Author

marcobortolami commented Jul 24, 2023

For me, this is basically ready. I still have few questions / changes that I list here. Once we answer these, for me we can merge.

  • Change litebird_sim version. This is a breaking change as all the scripts and notebooks instantiating a Simulation object (99% of them?) will fail. I'm not sure how to change the frameworks' version (like 0.10.0 --> 0.11.0). Is it something that is done at the PR level, for example with a commit?
  • Check figures that has changed. Two pictures showing examples of reports in the documentation have been changed as now a new line writing the random_seed appears. How can I check the documentation as it appears from a browser before the PR is merged? Just to verify that the size of the pictures is ok.
  • Names. When implementing this PR, I changed the name of the seed for the RNG from seed to random_seed. Although the usual name for a RNG seed is seed, I made this change as, in my opinion, its usage is more clear. To be more precise, when someone uses a seed is usually when creating a RNG, like rng = np.random.random(seed). However, most of the times people will pass this parameter to the constructor of Simulation, like sim = lbs.Simulation(random_seed=42). Here's my point: with random_ it's more clear what that mandatory parameter is used for (random numbers). I'm aware that it's not customary, so if you don't like this I can change this name everywhere (easy task with a grep, just takes some time). I would also rename sim.random in sim.rng or sim.random_number_generator, but maybe it's too much;)
  • Update changelog. Do this at the very last.

@ziotom78
Copy link
Member

ziotom78 commented Jul 25, 2023

Hi @marcobortolami , awesome work! I checked this PR and everything looks very nice.

Here are a few answers to your questions:

Change litebird_sim version. This is a breaking change as all the scripts and notebooks instantiating a Simulation object (99% of them?) will fail. I'm not sure how to change the frameworks' version (like 0.10.0 --> 0.11.0). Is it something that is done at the PR level, for example with a commit?

No, this is usually tackled by a separate commit on the master branch. (It's better to keep version bumps separated from PR doing actual work.) We'll release a new version once this PR is merged.

We follow semantic versioning, which means that until we hit version 1.0 we just bump the minor version every time we do a new release, without differentiating between breaking/non-breaking changes.

Check figures that has changed. Two pictures showing examples of reports in the documentation have been changed as now a new line writing the random_seed appears. How can I check the documentation as it appears from a browser before the PR is merged? Just to verify that the size of the pictures is ok.

Just type

make -C docs html

from the main directory. The -C docs flag tells Make to run within the docs/ directory, and html is the name of the target. Beware that you will likely need to install a few dependencies, most notably sphinx. (If you're curious, instead of html you can specify epub, latexpdf, …) Once the command completes, just run

firefox docs/build/html/index.html

(assuming you're using Firefox), and you'll be able to browse a local copy of the documentation. Every time you update the RST files, just re-run the make command and refresh the page.

Names. When implementing this PR, I changed the name of the seed for the RNG from seed to random_seed. Although the usual name for a RNG seed is seed, I made this change as, in my opinion, its usage is more clear. To be more precise, when someone uses a seed is usually when creating a RNG, like rng = np.random.random(seed). However, most of the times people will pass this parameter to the constructor of Simulation, like sim = lbs.Simulation(random_seed=42). Here's my point: with random_ it's more clear what that mandatory parameter is used for (random numbers). I'm aware that it's not customary, so if you don't like this I can change this name everywhere (easy task with a grep, just takes some time). I would also rename sim.random in sim.rng or sim.random_number_generator, but maybe it's too much;)

I fully agree, random_seed is much clearer to read and clearly states the intent of the parameter. I like it!

Thank you for your hard work!

@marcobortolami
Copy link
Contributor Author

marcobortolami commented Jul 25, 2023

Thanks for the semantic versioning reference:) we'll release a new version after the PR is merged, understood.
I checked the documentation, thanks!
Ok for the names:)
I updated the changelog!

I'll wait for a thumb up to this comment to know when I can merge!

@ziotom78
Copy link
Member

For me it's perfect! You can merge it whenever you want.

Thanks!

@marcobortolami marcobortolami merged commit a876b0c into master Jul 26, 2023
@marcobortolami marcobortolami deleted the fix_noise_seed_inconsistency branch July 26, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants