Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong result while training using GPU #402

Open
ln1equals0 opened this issue Dec 21, 2016 · 1 comment
Open

Wrong result while training using GPU #402

ln1equals0 opened this issue Dec 21, 2016 · 1 comment

Comments

@ln1equals0
Copy link

ln1equals0 commented Dec 21, 2016

Hi, we have a simple CNN network, which works fine on CPU, but since we have a new GPU installed (TITAN X), we want to train the network using GPU, but it gives us wrong results.

This is the code for training on CPU, it works fine
2016-12-21 2 17 50
2016-12-21 2 17 07
2016-12-21 2 12 26
2016-12-21 2 12 45

This is the result, I made it display the average loss every 100 iterations.
screenshot from 2016-12-21 14_06_18



So then I modified it and try to make it work on GPU, this are the changes I made

2016-12-21 2 22 01

2016-12-21 2 22 15

It is indeed much faster, but the result is totally wrong, you can see the loss of first 100 - 300 iterations became 3.4+, and it doesn't converge at all, the loss will stay at 3.3-3.4 till the end of training.
screenshot from 2016-12-21 14_05_58

Did I miss something? How can I make the training work on GPU. Thanks

@csxmli2016
Copy link

What is the result if you run test.sh in ~/torch file? Can everything pass successfully?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants