Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU only support + timeline trace support + training improvements from other pull requests #28

Merged
merged 4 commits into from
Nov 24, 2016

Conversation

ahmedammar
Copy link
Contributor

@ahmedammar ahmedammar commented Nov 19, 2016

Relates to #24

@ahmedammar
Copy link
Contributor Author

I haven't had a chance to test it on a machine with a GPU yet, but it should be ok.

@ahmedammar ahmedammar changed the title Add support for cpu-only mode. Add support for cpu-only mode. Also enable use of TF's work sharders. Nov 19, 2016
philokey and others added 2 commits November 21, 2016 18:25
Avoid the problem of training speed become slower iteration by iteration.
@ahmedammar ahmedammar changed the title Add support for cpu-only mode. Also enable use of TF's work sharders. Add support for cpu-only mode. Also enable use of TF's work sharders. And includes both other pull requests. Nov 21, 2016
@ahmedammar ahmedammar changed the title Add support for cpu-only mode. Also enable use of TF's work sharders. And includes both other pull requests. CPU only support and training improvements from other pull requests Nov 21, 2016
@ahmedammar
Copy link
Contributor Author

ahmedammar commented Nov 21, 2016

With both patches from @philokey and @ICapalija I am able to train much more efficiently now. N.B. I had to modify @ICapalija's patch slightly above.

This isn't using VGG16 so your milage may vary:

iter: 67900 / 70000, total loss: 0.6029, rpn_loss_cls: 0.0366, rpn_loss_box: 0.2336, loss_cls: 0.0485, loss_box: 0.2843, lr: 0.000100
speed: 0.126s / iter
iter: 67910 / 70000, total loss: 0.9322, rpn_loss_cls: 0.0310, rpn_loss_box: 0.1026, loss_cls: 0.2336, loss_box: 0.5650, lr: 0.000100
speed: 0.126s / iter
iter: 67920 / 70000, total loss: 0.6732, rpn_loss_cls: 0.0729, rpn_loss_box: 0.2227, loss_cls: 0.1087, loss_box: 0.2690, lr: 0.000100
speed: 0.126s / iter
iter: 67930 / 70000, total loss: 0.9137, rpn_loss_cls: 0.0306, rpn_loss_box: 0.2416, loss_cls: 0.1938, loss_box: 0.4477, lr: 0.000100
speed: 0.126s / iter
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:00:1E.0     Off |                    0 |
| N/A   82C    P0   120W / 149W |  10941MiB / 11439MiB |     78%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      2815    C   python                                       10937MiB |
+-----------------------------------------------------------------------------+

@ahmedammar ahmedammar changed the title CPU only support and training improvements from other pull requests CPU only support + timeline trace support + training improvements from other pull requests Nov 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants