-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fb matrix divider #328
Fb matrix divider #328
Conversation
to ensure they comply with the limits of the nameslist.
Changes to add support to 360-day and 365-day (no leap year) calendar - see ticket #209 * Additional CALTYPE namelist parameter in MISC section * New ww3_tc1 regtest.
* Updated ww3_bound and ww3_bounc to handle model grids formulated on a rotated pole. * Manual and nml/inp files to updated clarify that ww3_bound/ww3_bounc only accept input spectra formulated on a standard pole grid.
Updates to allow a coupling time step that is different from the model time step. * Includes new regtest (in ww3_tp2.14) for non-default oasis time step. * ww3_tp2.14 regtest added to matrix.base.
@aliabdolali thanks for providing an engineering solution. I'd be happy to check this out. Can you provide usage instructions? |
In line 37, define maxlist. The default is 100 which means 7 seubsets for the whole ~700 tests. |
Thanks @aliabdolali and sorry you originally included the instructions and I missed it. |
regtests/bin/matrix_divider.sh
Outdated
echo " echo ' * end of WAVEWATCH III matrix$i of regression tests *'" >> matrix$i | ||
echo " echo ' **************************************************************'" >> matrix$i | ||
echo " echo ' '" >> matrix$i | ||
sbatch matrix$i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this as this would be machine dependent and this could then not be used by others who are not using slurm or if we port to another machine that does not use sbatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll remove it.
Once other thing that I'll do for ourself is adding another script which does cloning the FB and Dev, download the tar from ftp, prepare matrix, divide it and submit. So I will include this to that.
regtests/bin/matrix_divider.sh
Outdated
# --------------------------------------------------------------------------- # | ||
# 1. clean up and definitions # | ||
# --------------------------------------------------------------------------- # | ||
rm before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also add this to the end for cleaning up? Do we want to add these files to .gitignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure,
Yes, we should add them to .gitignore
This is a great engineering fix and it's really nice that it's automated. We should see if this also works for others. I like the idea of committing this first, which would test that this split produces the same as the non-split matrix, but I still think changing from 24->8 is needed. Something that I don't like about this is that it copies the model folder which then you lose git capabilities because these files aren't then tracked. That aspect of this is not my favorite, but without a major build change this definitely simplifies the running of the regression tests in multiple parts and that is really nice. I also like the fact that all the regression tests are already in the regrets folder. When I do this manually I usually just use multiple clones. A possible future improvement would be to look at how we are splitting them instead of just equally into parts. For example, serial jobs could be separated and appropriate resources could be requested. Or other splitting for knowing which tests take what amount of time. |
Later, I will add something to check if the job is done, then all extra model? should be deleted. I'll modify it now to reflect your suggestions and ask you to review, after the mege, we will ask others to check at their ends. |
There's no need to merge before others check, they can check before it is merged. |
@JessicaMeixner-NOAA Do you want me to change 24 to 8 in this FB? |
No, because the regression tests results will change and this should be a clean comparison for this commit. |
@JessicaMeixner-NOAA @ukmo-ccbunney @thesser1 @mickaelaccensi |
Hi @aliabdolali I have one small issue when running the script though - the path to To get it to run I had to copy my Perhaps you always have your matrix files in the |
@JessicaMeixner-NOAA - something that I have been working on (but is not working properly yet) is having a "build" directory defined in WW3 that contains the Perhaps this might tie in with Ali's changes at a later date (to avoid copying the whole model directory?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @aliabdolali - I hope you don't mind me hijacking the review.
I've spotted a few "rm" commands that need a "-r" flag to remove the model* directories.
regtests/bin/matrix_divider_p.sh
Outdated
echo " echo ' * end of WAVEWATCH III matrix$count of regression tests *'" >> matrix$count | ||
echo " echo ' **************************************************************'" >> matrix$count | ||
echo " echo ' '" >> matrix$count | ||
echo "rm ../model$count" >> matrix$count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be "rm -r" to remove directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
regtests/bin/matrix_divider_p.sh
Outdated
echo " echo ' * end of WAVEWATCH III matrix$count of regression tests *'" >> matrix$count | ||
echo " echo ' **************************************************************'" >> matrix$count | ||
echo " echo ' '" >> matrix$count | ||
echo "rm ../model$count" >> matrix$count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be "rm -r" to remove directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
regtests/bin/matrix_divider_p.sh
Outdated
echo " echo ' * end of WAVEWATCH III matrix$count of regression tests *'" >> matrix$count | ||
echo " echo ' **************************************************************'" >> matrix$count | ||
echo " echo ' '" >> matrix$count | ||
echo "rm ../model$count" >> matrix$count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be "rm -r" to remove directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@ukmo-ccbunney Where do you run matrix_ukmet? I assume you do it in regtests. Then, do the same for matrix_divider_p.sh as matrix is located there. |
@ukmo-ccbunney this would be great. We can tackle it later. |
@aliabdolali thanks for addressing all my comments! |
@JessicaMeixner-NOAA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aliabdolali
Hi @thesser1 @ukmo-ccbunney @mickaelaccensi
Following our discussion on the slow speed of our regression suite, I modified the script I used to divide the main matrix into a given number of subsets (it is defined by defining the number of tests in each subset), and tested it with the most recent code. The entire regtest took 3-4 hr in 7 subsets (each 100 test) on 24 cores. It would definitely faster if we change n=24 to n=8 cores.
There is still room for improvement, but I wanted to provide an engineering solution without touching any part of the code. In this way, we just duplicate the model directory and let each individual matrix does its simulation separately from the same parent and store all the solution in one place.
The extra step after matrix file generation is defining the max number of tests in each subset and executing ./bin/matrix_divider.sh
@JessicaMeixner-NOAA, what do you think?
@thesser1 @ukmo-ccbunney @mickaelaccensi could you check it at your end and provide suggestions?
One thing that I'll add in the future is dividing the parallel vs serial tests, so we can optimize the number of tests in each subset better. It will be an easy implementation.