Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reweighting branch merge #349

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

gbergner
Copy link

Currently implemented additions:

  • measurement reweighting factors with DDalphaAMG support
  • improved reweighting monomial for alternative computation (CG)
  • alternative computation of reweighting factors without inversions
  • so far not working: eigenvalue estimation for clover operators

@kostrzewa
Copy link
Member

I haven't yet been able to look at everything, but I'm wondering why we now have yet another executable. In principle, offline_measurement should provide the required functionality, no?

@gbergner
Copy link
Author

gbergner commented Jan 2, 2017

Well the offline_measurement seems to be outdated. I would prefer to remove it or rework it.

@kostrzewa
Copy link
Member

I guess it's missing some of the mixed precision initialisations, correct? Of course, merging the two executables (into measure, say) or removing offline_measurement is absolutely okay, as long as the functionality it provided stays available (which, as far as I can tell, measure proides)

@gbergner
Copy link
Author

gbergner commented Jan 3, 2017

There seem to be several things like also the random seed initialization, which is quite important for the reweighting factors. It seems to be that the offline_measurement has not been maintained for some time and is outdated compared to the invert.c. Therefore I have created a new measure.c derived from the invert.c. Of course, I would prefer to move the invert.c-functionality in some function that can be optionally called from the measure.c. The duplicated code in offline_measurement, measure, invert is bad. However, I would first like to discuss such kind of major changes since several parts are not independent due to the usage of the same workspace of Dirac-Vectors.

Copy link
Member

@kostrzewa kostrzewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, beyond replacing offline_measurement, there are a few points that ought to be discussed. These are essentially only code-duplication issues. Also, we have made it customary to use 2 spaces instead of tabs (even though there are numerous spots in the code where this is not yet implemented), it would be great if you could adjust your tabs accordingly.

meas/reweightingmeas.c Show resolved Hide resolved
meas/reweightingmeas.c Outdated Show resolved Hide resolved
meas/reweightingmeas.c Show resolved Hide resolved
@gbergner
Copy link
Author

Also, we have made it customary to use 2 spaces instead of tabs (even though there are numerous spots in the code where this is not yet implemented), it would be great if you could adjust your tabs accordingly.

Please let me know all of you conventions in order to implement them. So far I could not deduce any conventions from the current state of the code. I will change this soon.

@kostrzewa
Copy link
Member

Please let me know all of you conventions in order to implement them. So far I could not deduce any conventions from the current state of the code. I will change this soon.

That's pretty much the only consistent one... I'm afraid there are no official conventions, perhaps I'll take the time to write up a coding style howto at some point...

Copy link
Member

@kostrzewa kostrzewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urbach
I've started a review, could you check if you agree / have anything to add?

@gbergner
For the indentation, you can use clang-format if you'd like. In the qphix branch, we are currently using this style file: https://github.com/plabus/tmLQCD/blob/qphix_devel/.clang-format

configure.in Outdated Show resolved Hide resolved
meas/measurements.h Outdated Show resolved Hide resolved
.gitignore Show resolved Hide resolved
measure.c Outdated Show resolved Hide resolved
@kostrzewa
Copy link
Member

@urbach please also look at the outdated comments above, which discussed some of the code-doubling which occurs here

DDalphaAMG_interface.c Outdated Show resolved Hide resolved
Makefile.in Outdated Show resolved Hide resolved
configure.in Outdated Show resolved Hide resolved
doc/main.tex Outdated Show resolved Hide resolved
invert.c Outdated Show resolved Hide resolved
meas/reweightingmeas.c Outdated Show resolved Hide resolved
meas/reweightingmeas.c Outdated Show resolved Hide resolved
meas/reweightingmeas.c Outdated Show resolved Hide resolved
meas/reweightingmeas.c Outdated Show resolved Hide resolved
meas/reweightingmeas.c Outdated Show resolved Hide resolved
@kostrzewa
Copy link
Member

kostrzewa commented Apr 25, 2017

Regarding (f)lex on Jureca: you just need to load the module (module load flex) The check should certainly be present because read_input.c is not checked into the repository (nor should it be)

@gbergner
Copy link
Author

Yes, I agree the check should be reverted.

@gbergner
Copy link
Author

I have tried to include your suggestions/revert some changes. Any further suggestions?

@kostrzewa
Copy link
Member

@gbergner Since we are continuing the simulation which needs to be reweighted, I wanted to ask about the status of these changes.

meas/reweightingmeas.c Outdated Show resolved Hide resolved
meas/reweightingmeas.c Outdated Show resolved Hide resolved
meas/reweightingmeas.c Outdated Show resolved Hide resolved
.gitignore Show resolved Hide resolved
DDalphaAMG_interface.c Outdated Show resolved Hide resolved
invert.c Outdated Show resolved Hide resolved
read_input.l Show resolved Hide resolved
.gitignore Show resolved Hide resolved
Makefile.in Outdated Show resolved Hide resolved
meas/reweightingmeas.c Outdated Show resolved Hide resolved
@kostrzewa
Copy link
Member

I realised that you probably didn't see some of my comments because I hadn't "submitted" the review... sorry about that.

@gbergner
Copy link
Author

gbergner commented Oct 4, 2018

Some general comment concerning the offline_measurement.c: I had the impression that there are relevant differences between the offline_measurement.c and the invert.c. Some measurements are rather included in the invert.c than in the offline_measurement.c, which indicates that the latter one might be outdated. However, I had some problems with segfaults when running the invert. Therefore I have included the new measure.c based on the invert.c.
I don't know how to deal with this problem. If all of these files are correct, we can try to merge them.

@gbergner
Copy link
Author

gbergner commented Oct 4, 2018

P.S. The segfault appeared when measuring more than one configuration with the invert, as far as I remember.

@kostrzewa
Copy link
Member

invert doesn't (generally) segfault when doing inversions on multiple configurations, but it is really only designed to do explicit inversions, rather than anything from the "measurements" modules. Support for running "measurements" together with inversions was added simply for convenience, as it would sometimes be nice to do a gradient flow measurement, say, at the same time as the inversion.

To proceed, the first task would be to resolve the merge conflicts. The second task would be to merge measure and offline_measurement. Again, I have no preference as to which executable remains in the end as long as the functionality is preserved. As I mentioned multiple times, offline_measurement is not outdated and the seed issue that you referred to has also been resolved since (although it never mattered in practice for the measurements that were done).

@kostrzewa
Copy link
Member

Also note that the configuration filename has since been extended to 500 characters in all relevant executables and the string is now written safely using snprintf, a change which I've wanted to make for years...

@gbergner
Copy link
Author

gbergner commented Oct 4, 2018

I have been looking at the mode number measurement, which is invert.c. Therefore I have assumed that the online_measurements was outdated.
I have not checked whether the segfault is now still reproducable. I should try this again now.

More generally I don't understand the policy. There are two options:

  1. One file that does everything. In this case there should be no separate invert.c with a copy of the measurements.
  2. Separate files for different measurements.
    Otherwise it is hard to find any conclusive form.

One easy thing for me would be to move the measure.c to offline_measurements.c and delete the original one. I hope everything will still be working in this case.

@kostrzewa
Copy link
Member

I have been looking at the mode number measurement, which is invert.c. Therefore I have assumed that the online_measurements was outdated.
I have not checked whether the segfault is now still reproducable. I should try this again now.

The modenumber measurement was added before there was code review. It should never have ended up in invert...

@kostrzewa
Copy link
Member

One file that does everything. In this case there should be no separate invert.c with a copy of the measurements.

Inversions are quite separate from measurements at present, which is the reason for the split. Some years ago, I introduced offline_measurement to run anything from the measurement modules. At the same time, functionality to do measurements while also doing inversions (i.e., generation of propagators) was introduced into invert for convience reasons.

One easy thing for me would be to move the measure.c to offline_measurements.c and delete the original one. I hope everything will still be working in this case.

This would be good. I would say the reweighting factors are of the measurement type and they should also be accessed via the BeginMeasurement X [...] EndMeasurement syntax in the input file.

@kostrzewa
Copy link
Member

kostrzewa commented Oct 4, 2018

Inversions are quite separate from measurements at present, which is the reason for the split. Some years ago, I introduced offline_measurement to run anything from the measurement modules. At the same time, functionality to do measurements while also doing inversions (i.e., generation of propagators) was introduced into invert for convience reasons.

The interface for inversions is quite different from that for measurements, since it is positively ancient in comparison. One could probably recast various types of "propagator generation" as measurements and then unify things. That would actually be quite nice.

One has to keep in mind, however, that invert is not going to really be used much in the medium-term future. Rather, tmLQCD is linked into a contraction framework in order to serve as a wrapper for the various solvers and for the loading of gauge fields and/or storage of propagators with all the nice checksum calculations which are a pain to re-implement.

@kostrzewa
Copy link
Member

Is there a reason for specifying the kappa steps in an additional file? The tmLQCD input file reader can tokenize lists of comma-separated doubles. See FLTLIST in read_input.l

@gbergner
Copy link
Author

I had to continue the work with the code and I did not know about the plans for the pull request. Next time I will create a new branch for the continuation of the work while waiting for any response regarding the pull request.

@gbergner
Copy link
Author

Probably you mean this kind of solution:
{SPC}*MGNumberOfVectors{EQL}{FLTLIST} {
char paramname[100];
char error_message[200];
int list_end = 0;
int level = 0;

fltlist_tokenize(yytext, paramname, 100);
int n_vec = (int)fltlist_next_token(&list_end);
while( list_end != 1 ){
  if( level >= (QUDA_MAX_MG_LEVEL-1) ){
    snprintf(error_message, 200, "Exceeded maximum number of levels (%d-1) parsing %s!\n", QUDA_MAX_MG_LEVEL, paramname);
    yy_fatal_error(error_message);
  }
  
  quda_input.mg_n_vec[level] = n_vec;
  if(myverbose){ 
    printf("  %s, level %d set to %d line %d\n", paramname,
            level, quda_input.mg_n_vec[level], line_of_file);
  }
  level++;
  n_vec = fltlist_next_token(&list_end);
}

}
May be this is nicer .... I will think about it.

@gbergner
Copy link
Author

OK, I have added the FLTLIST way. Is this consistent with what you had in mind?
Personally, I am not a fan of flex.

@kostrzewa
Copy link
Member

Sure, I would prefer C++ and yaml, but flex works rather nicely for our purposes.

As for the parsing of additional options, dealing with one input file is bad enough in my opinion, so having this inline is probably the better choice. Looks good!

@kostrzewa
Copy link
Member

Does this still work when merged with with the https://github.com/etmc/tmlqcd/tree/DDalphaAMG_nd_merge_etmc_master branch? These two branches are the next to go into the code-base so we need to check for regressions. @gbergner could you test merging with the branch in question to see if the reweighting measurement still works correctly?

@kostrzewa
Copy link
Member

Could you add some documentation of the parameters to the LaTeX docs?

@kostrzewa
Copy link
Member

The fact that this is not general but depends on DDalphAMG still bugs me, but I'm okay with pulling it in if we agree that a general solution should be implemented as soon as possible.

Copy link
Member

@kostrzewa kostrzewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing merge with https://github.com/etmc/tmlqcd/tree/DDalphaAMG_nd_merge_etmc_master required as well as documentation of parameters.

((reweighting_parameter*)meas->parameter)->reweighting_operator = a;
if(myverbose) printf(" ReweightingOperator set to %d line %d measurement id=%d\n", a, line_of_file, meas->id);
}
{SPC}*Samples{EQL}{DIGIT}+ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you call this "NoSamples" to be consistent with other input parameters with the same name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants