-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent & Perhaps buggy dynamic tuning behavior #7672
Comments
The decision is made based on two principles: use a globally known amount such that the decision is deterministic across all processes, and use an amount that is reflective of the amount of data each process is supposed to receive but without taking in account the algorithm. This explains why for allgather we use comm size (because at the end each process will need to receive that amount), but we are not using the comm size for the allreduce. Scan and Exscan are outliers, that have been copy/paste from another algorithm without taking in account the above considerations. |
@bosilca Can you explain why datatype size * comm size is used for MPI_Gather? I don't really understand why datatype size is important without number of elements |
It's not that important whether comm size is used for each collective or not, but MPI_Gather just seems broken, can only ever tune based on collective size, not number of elements being sent |
The comm size is an extremely important tuning parameters as most of the papers highlight. But you are right, the number of elements should have been part of the gather decision. |
I do think comm size is important, but that information is already contained in the file format (since all message sizes belong to a comm size already), so it seems a little extraneous, but it's not a big deal. The datatype size * comm size we should just change, seems like a bug we should fix. |
#7758 resolves this issue for gather/scatter |
Dynamic tuning files are organized in this format:
However, the actual definition of "Message size" is vague and dependent on the collective in question (https://github.com/open-mpi/ompi/blob/master/ompi/mca/coll/tuned/coll_tuned_decision_dynamic.c)
The following table represents the behavior of each collective's interpretation of "Message size"
There are a couple issues with how this is done. First, the way that the decision file is parsed, each message size already has a corresponding comm size, so there's no reason to have comm size as part of the calculation, it gives us no additional data. Second, I don't see how the MPI_Gather implementation's interpretation of message size as datatype size * comm size is correct in any way.
We should have a more consistent definition of what message size means for our dynamic tuning file.
The text was updated successfully, but these errors were encountered: