Windows 10 Pro, CUDA 11.0 +PyTorch 1.7.1+ issue #866

maaquib · 2020-12-11T05:44:27Z

On my Windows 10 Pro, I had CUDA 11.0 and PyTorch 1.7.1+ (it was 1.7.0 but was upgraded to 1.7.1 when I ran python .\ts_scripts\install_dependencies.py --environment=dev, with == changed to >= in the requirements\torch.txt)

(base) PS C:\Users\Warrior\github> & 'C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe'
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 452.57       Driver Version: 452.57       CUDA Version: 11.0     |
===================+======================|
|   0  Quadro GP100       WDDM  | 00000000:01:00.0 Off |                  Off |
| 26%   31C    P0    25W / 235W |     89MiB / 16384MiB |      0%      Default |

pip list |grep torch
torch1.7.1+cu110

# inside ipython
In [6]: torch.cuda.get_device_name(0)
Out[6]: 'Quadro GP100'

The first time I ran python .\torchserve_sanity.py, 151 tests completed, 20 failed - result win_gpu_torch171_sanity_test.txt

In the TorchServe on Windows Troubleshooting, it says "you may have to change the port number for inference, management and metrics apis as specified in frontend/server/src/test/resources/config.properties, all files in frontend/server/src/test/resources/snapshot/* and frontend/server/src/main/java/org/pytorch/serve/util/ConfigManager.java, but it's unclear exactly how to make the changes. So I discarded the requirements/torch.txt changes, and installed torch 1.6.0 etc and reran the sanity test. The result is in win_gpu_torch160_sanity_test.txt, with the same number of failed tests, which may be caused by the same error OSError: [WinError 10013] An attempt was made to access a socket reported in issue #828.

Originally posted by @jeffxtang in #851 (comment)

The text was updated successfully, but these errors were encountered:

lokeshgupta1975 · 2020-12-11T15:58:49Z

@jeffxtang, can you now verify and close this issue

jeffxtang · 2020-12-11T21:29:40Z

@jeffxtang, can you now verify and close this issue

you mean to test again and verify if the problem is gone? or just verify what @maaquib - thanks - posted for me and close it? i may not have time to test this again today..

jeffxtang · 2020-12-15T21:23:17Z

Did another sanity test with the commit 2541292 and was stuck at 82% EXECUTING for about an hour. Full log is attached. Is this still caused by "OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions"?

sanity_log_win_gpu_pt171.txt

jeffxtang · 2020-12-15T22:08:04Z

About the error "OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions", it was mentioned at #828 (comment) and the Windows Native TroubleShooting may help - "If you are building from source then you may have to change the port number for inference, management and metrics apis as specified in frontend/server/src/test/resources/config.properties"

curl http://127.0.0.1:8080/predictions/densenet161 -T kitten_small.jpg works for me. But how should the lines below in config.properties be changed? I tried changing inference_address to be https://127.0.0.1:8080 or 9000 but still got errors.

inference_address=https://127.0.0.1:8443
management_address=https://127.0.0.1:8444
metrics_address=https://127.0.0.1:8445

harshbafna · 2020-12-16T03:00:32Z

About the error "OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions", it was mentioned at #828 (comment) and the Windows Native TroubleShooting may help - "If you are building from source then you may have to change the port number for inference, management and metrics apis as specified in frontend/server/src/test/resources/config.properties"

curl http://127.0.0.1:8080/predictions/densenet161 -T kitten_small.jpg works for me. But how should the lines below in config.properties be changed? I tried changing inference_address to be https://127.0.0.1:8080 or 9000 but still got errors.
inference_address=https://127.0.0.1:8443
management_address=https://127.0.0.1:8444
metrics_address=https://127.0.0.1:8445

@jeffxtang: For running on HTTPS, you will need to generate and provide the private_key_file and certificate_file parameters in your config.properties file.

harshbafna · 2020-12-16T03:58:46Z

Did another sanity test with the commit 2541292 and was stuck at 82% EXECUTING for about an hour. Full log is attached. Is this still caused by "OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions"?

sanity_log_win_gpu_pt171.txt

@jeffxtang:

From the shared logs, I am still observing the same error at different places and a bunch of test cases have failed.

    2020-12-15 12:16:56,694 [INFO ] W-9012-respheader_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
    2020-12-15 12:16:56,694 [INFO ] W-9012-respheader_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
    2020-12-15 12:16:56,694 [INFO ] W-9012-respheader_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "C:\Users\Warrior\repos\serve\ts\model_service_worker.py", line 182, in <module>
    2020-12-15 12:16:56,694 [INFO ] W-9012-respheader_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     worker.run_server()
    2020-12-15 12:16:56,694 [INFO ] W-9012-respheader_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "C:\Users\Warrior\repos\serve\ts\model_service_worker.py", line 141, in run_server
    2020-12-15 12:16:56,694 [INFO ] W-9012-respheader_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.sock.bind((self.sock_name, int(s<==========---> 82% EXECUTING [1m 12s]
    2020-12-15 12:16:56,694 [INFO ] W-9012-respheader_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions

TorchServe Sanity suite executes a bunch of integration test cases which uses different ports to load the model workers. In your case it seems your user doesn't have access to some ports or the ports required are already in use by some other process and hence the errors. We haven't observed any such error in out build/test machines/environments, which has full permissions for accessing the resources.

From the other threads, I understand that you were able to successfully install from source and run basic model serving through REST as well as gRPC APIs and its the frontend build which is failing for you?

jeffxtang · 2020-12-16T05:42:50Z

@harshbafna yes I'm able to run the basic model serving with REST and GRPC APIs but haven't made the sanity test work. I'll try to make https work to see it'll fix the problem.

jeffxtang · 2020-12-16T18:31:23Z

I followed the configuration Enable SSL's Examples' two options and retried the sanity test but for option 1 I was still stuck at 82% with the same errors and for option 2, the test finished quickly with the messages

153 tests completed, 1 failed, 152 skipped
> Task :server:test FAILED
FAILURE: Build failed with an exception.

maaquib · 2020-12-16T18:49:05Z

@jeffxtang What was the failed test? Can you provide the logs?

jeffxtang · 2020-12-17T01:02:15Z

@jeffxtang What was the failed test? Can you provide the logs?

sanity_log_win_failed.txt

harshbafna · 2020-12-18T04:03:38Z

@jeffxtang

Could you also zip and attache the reports generated at following path:

file:///C:/Users/Warrior/repos/serve/frontend/server/build/reports/

From the shared logs, it says it ran into some FileNotFound exception while trying to start the test suite.

I followed the configuration Enable SSL's Examples' two options and retried the sanity test but for option 1 I was still stuck at 82% with the same errors and for option 2, the test finished quickly with the messages

You should not make any changes in the config.properties file for running the test suites, it will mess up the the expected outputs. The reason for pointing you to that doc was that you were trying to use https in the config.properties.

The sanity suite/regression suite or any test case should be executed without any changes in the congif, unless you have made corresponding changes in the code.

To run the sanity suite all you need is to run ts/insall_dependencies.py followed by torchserve_sanity.py.
Please also ensure you have gone through all the pre-requisites specified in the TorchServe on window native documentation

maaquib assigned dhaniram-kshirsagar Dec 11, 2020

maaquib mentioned this issue Dec 11, 2020

[WIP] Added gpu support for windows native #851

Merged

10 tasks

jeffxtang mentioned this issue Dec 15, 2020

torchserve sanity is failing on windows native #828

Closed

maaquib assigned harshbafna Dec 15, 2020

maaquib added this to the v0.3.0 milestone Dec 16, 2020

pytorch deleted a comment from dhaniram-kshirsagar Dec 16, 2020

maaquib modified the milestones: v0.3.0, v0.4.0 Dec 18, 2020

harshbafna added the triaged_wait Waiting for the Reporter's resp label Dec 21, 2020

msaroufim closed this as completed Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows 10 Pro, CUDA 11.0 +PyTorch 1.7.1+ issue #866

Windows 10 Pro, CUDA 11.0 +PyTorch 1.7.1+ issue #866

maaquib commented Dec 11, 2020 •

edited

Loading

lokeshgupta1975 commented Dec 11, 2020

jeffxtang commented Dec 11, 2020

jeffxtang commented Dec 15, 2020

jeffxtang commented Dec 15, 2020

harshbafna commented Dec 16, 2020

harshbafna commented Dec 16, 2020

jeffxtang commented Dec 16, 2020

jeffxtang commented Dec 16, 2020

maaquib commented Dec 16, 2020

jeffxtang commented Dec 17, 2020

harshbafna commented Dec 18, 2020

Windows 10 Pro, CUDA 11.0 +PyTorch 1.7.1+ issue #866

Windows 10 Pro, CUDA 11.0 +PyTorch 1.7.1+ issue #866

Comments

maaquib commented Dec 11, 2020 • edited Loading

lokeshgupta1975 commented Dec 11, 2020

jeffxtang commented Dec 11, 2020

jeffxtang commented Dec 15, 2020

jeffxtang commented Dec 15, 2020

harshbafna commented Dec 16, 2020

harshbafna commented Dec 16, 2020

jeffxtang commented Dec 16, 2020

jeffxtang commented Dec 16, 2020

maaquib commented Dec 16, 2020

jeffxtang commented Dec 17, 2020

harshbafna commented Dec 18, 2020

maaquib commented Dec 11, 2020 •

edited

Loading