-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Make check_isfinite, check_scale optional in clip_global_norm #12042
Conversation
69926ef
to
4546948
Compare
@leezu can you please follow the community's convention to file a JIRA and update the issue title? |
@lupesko I considered this to fall into "PRs with tiny changes" which don't require JIRA. Currently tests are failing, I'll need to add some fixes before review can proceed. Sorry for the delay |
python/mxnet/gluon/utils.py
Outdated
requires a blocking .asscalar() call. | ||
check_scale : bool, default True | ||
If True, skip array rescaling if max_norm / total_norm >= 1. This | ||
requires a blocking call. If False, rescale arrays with min(1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rescale is not blocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But previously there is a blocking call to check if re-scale is necessary.
If check_scale is False
, we always rescale, possibly by 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe we can use contrib.cond for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contrib.cond
would only help when working with the symbolic interface. For ndarray, contrib.cond
also uses a blocking .asscalar()
call. clip_global_norm
always works with the ndarray API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now the PR preserves the old default behavior. I'm happy to remove the check_scale
argument and to always rescale, trading off computation against avoiding blocking calls. That would assume that it is always cheaper/better do the rescaling than to wait for a blocking asscalar() and potentially avoid rescaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for removing check_scale. It doesn't seem necessary. We can always perform a scaling.
If both are set to false, clip_global_norm does not force any synchronization and throughput can be increased.
4546948
to
74aa177
Compare
74aa177
to
e950826
Compare
…#12042) * Make check_isfinite, check_scale optional in clip_global_norm If both are set to false, clip_global_norm does not force any synchronization and throughput can be increased. * Add tests * Remove check_scale * Document return type * Fix test_gluon_gpu
Description
Make check_isfinite, check_scale optional in clip_global_norm. If both are set to false, clip_global_norm does not force any synchronization and throughput can be increased. Note if check_scale=False, this requires multiplying all arrays with 1 in cases where the multiplication could be skipped by doing a blocking check.
While this PR preserves the old default behavior of using blocking calls, we may want to change the default behavior to improve throughput.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments
@eric-haibin-lin