Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release tests can pass after image 404 #287

Closed
ssube opened this issue Mar 26, 2023 · 1 comment
Closed

release tests can pass after image 404 #287

ssube opened this issue Mar 26, 2023 · 1 comment
Labels
status/fixed issues that have been fixed and released type/bug broken features
Milestone

Comments

@ssube
Copy link
Owner

ssube commented Mar 26, 2023

[2023-03-26 10:05:09,045] INFO: MainProcess MainThread __main__: test passed: txt2img-sd-v1-5-512-muffin-unipc
[2023-03-26 10:05:09,045] INFO: MainProcess MainThread __main__: starting test: txt2img-sd-v2-1-512-muffin
[2023-03-26 10:05:21,145] INFO: MainProcess MainThread __main__: MSE within threshold: 0.00000 < 0.00010
[2023-03-26 10:05:21,146] INFO: MainProcess MainThread __main__: test passed: txt2img-sd-v2-1-512-muffin
[2023-03-26 10:05:21,146] INFO: MainProcess MainThread __main__: starting test: txt2img-sd-v2-1-768-muffin
[2023-03-26 10:05:27,166] WARNING: MainProcess MainThread __main__: request failed: 404
[2023-03-26 10:05:27,167] INFO: MainProcess MainThread __main__: test passed: txt2img-sd-v2-1-768-muffin
[2023-03-26 10:05:27,167] INFO: MainProcess MainThread __main__: starting test: txt2img-openjourney-512-muffin
[2023-03-26 10:06:27,383] INFO: MainProcess MainThread __main__: MSE within threshold: 0.00000 < 0.00010
[2023-03-26 10:06:27,383] INFO: MainProcess MainThread __main__: test passed: txt2img-openjourney-512-muffin
[2023-03-26 10:06:27,384] INFO: MainProcess MainThread __main__: starting test: txt2img-knollingcase-512-muffin
[2023-03-26 10:06:39,518] INFO: MainProcess MainThread __main__: MSE within threshold: 0.00000 < 0.00010
@ssube ssube added this to the v0.9 milestone Mar 26, 2023
@ssube ssube added status/progress issues that are in progress and have a branch type/bug broken features status/fixed issues that have been fixed and released and removed status/progress issues that are in progress and have a branch labels Mar 26, 2023
@ssube
Copy link
Owner Author

ssube commented Mar 27, 2023

This was happening after errors, like the worker dying:

2023-03-27 00:09:56.906687292 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/down_blocks.2/downsamplers.0/conv/Conv' Status Message: /onnxruntime_src/onnxrun
time/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::
conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*
, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=compute-infer-1 ; expr=cudaMallo
c((void**)&p, size);                                                                                                                                                                                                                   
                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                       
  0%|                                                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
[2023-03-27 00:09:56,907] ERROR: 410202 140509079797760 onnx_web.worker.worker: detected out-of-memory error, exiting: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:'/down_
blocks.2/downsamplers.0/conv/Conv' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char
*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, vo
id, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA f
ailure 2: out of memory ; GPU=0 ; hostname=compute-infer-1 ; expr=cudaMalloc((void**)&p, size);      

2023-03-27 00:09:56.906687292 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/down_blocks.2/downsamplers.0/conv/Conv' Status Message: /onnxruntime_src/onnxrun
time/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::
conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*
, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=compute-infer-1 ; expr=cudaMallo
c((void**)&p, size);                                                                                                                                                                                                                   
      

[2023-03-27 00:10:00,082] INFO: 399712 140298697029184 werkzeug: 10.2.2.16 - - [27/Mar/2023 00:10:00] "GET /api/status HTTP/1.1" 200 -                                                                                                 
[2023-03-27 00:10:00,871] TRACE: 399712 140298722207296 onnx_web.worker.pool: checking in from progress worker thread                                                                                                                  
[2023-03-27 00:10:00,871] TRACE: 399712 140298722207296 onnx_web.worker.pool: empty queue in progress worker for device cuda                                                                                                           
[2023-03-27 00:10:00,871] DEBUG: 399712 140298722207296 onnx_web.worker.pool: enqueueing next job for idle worker                                                                                                                      
[2023-03-27 00:10:00,871] TRACE: 399712 140298722207296 onnx_web.worker.pool: no pending jobs for device cuda                                                                                                                          
[2023-03-27 00:10:01,711] DEBUG: 399712 140298697029184 onnx_web.worker.pool: checking status for finished job: txt2img_0_370d011ea481180c4f5214e8a104c5ea4334affbb5e8d1ffb867887a4d49b84e_1679875795_0.png                            
[2023-03-27 00:10:01,711] INFO: 399712 140298697029184 werkzeug: 10.2.2.16 - - [27/Mar/2023 00:10:01] "GET /api/ready?output=txt2img_0_370d011ea481180c4f5214e8a104c5ea4334affbb5e8d1ffb867887a4d49b84e_1679875795_0.png HTTP/1.1" 200 
-                                                                                                                                                                                                                                      
[2023-03-27 00:10:01,716] INFO: 399712 140298697029184 werkzeug: 10.2.2.16 - - [27/Mar/2023 00:10:01] "GET /output/txt2img_0_370d011ea481180c4f5214e8a104c5ea4334affbb5e8d1ffb867887a4d49b84e_1679875795_0.png HTTP/1.1" 404 -         
[2023-03-27 00:10:01,721] INFO: 399712 140298697029184 onnx_web.server.params: request from 10.2.2.16: 25 rounds of ddim using ../models/diffusion-openjourney on any device, 512x512, 6.0, 0 - mdjrny-v4 style a giant muffin         
[2023-03-27 00:10:01,721] WARNING: 399712 140298697029184 onnx_web.utils: invalid selection: None                                                                                                                                      
[2023-03-27 00:10:01,721] WARNING: 399712 140298697029184 onnx_web.utils: invalid selection: None                                                                                                                                      
[2023-03-27 00:10:01,721] INFO: 399712 140298697029184 onnx_web.server.api: txt2img job queued for: txt2img_0_c965bc9bdd893107021237367f1ecf4d694f500740dcf8b8060d67d80de9c573_1679875801_0.png                                        
[2023-03-27 00:10:01,721] TRACE: 399712 140298697029184 onnx_web.worker.pool: jobs queued by device: [(0, 2)] 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/fixed issues that have been fixed and released type/bug broken features
Projects
None yet
Development

No branches or pull requests

1 participant