Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow training on larger datasets #2

Merged
merged 1 commit into from
Jul 31, 2020
Merged

Allow training on larger datasets #2

merged 1 commit into from
Jul 31, 2020

Conversation

DhairyaLGandhi
Copy link
Member

So this was a bit unexpected, but it seems to be related to machine precision, since it was very difficult to get the model to fail even with the worst case scenario in many cases. The background is that the current definition of softplus lead to some ill-behaved gradients for large negative values.

julia> gradient(log  exp, -100)
(0.9999999999999999,)
julia> gradient(log  exp, -100 + eps(Float32))
(Inf32,)

Which eventually manifested like so:

julia> gradient(x -> sum(softplus.(x)), randn(Float32, 3,3) .* 10f1 )
(Float32[NaN NaN 1.0; NaN 0.0014634054 NaN; NaN 1.7532707f-27 1.0],)

There is a PR on upstream to fix this, but I am adding the adjoint here so we don't have to worry about the dependencies too much for the time being. Defining the adjoint for softplus fixes this.

cc @rkurchin

@DhairyaLGandhi DhairyaLGandhi changed the title Allow training on larger dataset Allow training on larger datasets Jul 31, 2020
@rkurchin
Copy link
Member

Just tested this 5x on the 15k test set and 5x on the 32k test set with no NaN's, merging!

@rkurchin rkurchin merged commit f0d32fc into normalization Jul 31, 2020
@rkurchin rkurchin deleted the dg/train branch July 31, 2020 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants