release tests can pass after image 404 #287

ssube · 2023-03-26T23:26:41Z

[2023-03-26 10:05:09,045] INFO: MainProcess MainThread __main__: test passed: txt2img-sd-v1-5-512-muffin-unipc
[2023-03-26 10:05:09,045] INFO: MainProcess MainThread __main__: starting test: txt2img-sd-v2-1-512-muffin
[2023-03-26 10:05:21,145] INFO: MainProcess MainThread __main__: MSE within threshold: 0.00000 < 0.00010
[2023-03-26 10:05:21,146] INFO: MainProcess MainThread __main__: test passed: txt2img-sd-v2-1-512-muffin
[2023-03-26 10:05:21,146] INFO: MainProcess MainThread __main__: starting test: txt2img-sd-v2-1-768-muffin
[2023-03-26 10:05:27,166] WARNING: MainProcess MainThread __main__: request failed: 404
[2023-03-26 10:05:27,167] INFO: MainProcess MainThread __main__: test passed: txt2img-sd-v2-1-768-muffin
[2023-03-26 10:05:27,167] INFO: MainProcess MainThread __main__: starting test: txt2img-openjourney-512-muffin
[2023-03-26 10:06:27,383] INFO: MainProcess MainThread __main__: MSE within threshold: 0.00000 < 0.00010
[2023-03-26 10:06:27,383] INFO: MainProcess MainThread __main__: test passed: txt2img-openjourney-512-muffin
[2023-03-26 10:06:27,384] INFO: MainProcess MainThread __main__: starting test: txt2img-knollingcase-512-muffin
[2023-03-26 10:06:39,518] INFO: MainProcess MainThread __main__: MSE within threshold: 0.00000 < 0.00010

The text was updated successfully, but these errors were encountered:

ssube · 2023-03-27T01:44:08Z

This was happening after errors, like the worker dying:

2023-03-27 00:09:56.906687292 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/down_blocks.2/downsamplers.0/conv/Conv' Status Message: /onnxruntime_src/onnxrun
time/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::
conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*
, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=compute-infer-1 ; expr=cudaMallo
c((void**)&p, size);                                                                                                                                                                                                                   
                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                       
  0%|                                                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
[2023-03-27 00:09:56,907] ERROR: 410202 140509079797760 onnx_web.worker.worker: detected out-of-memory error, exiting: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:'/down_
blocks.2/downsamplers.0/conv/Conv' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char
*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, vo
id, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA f
ailure 2: out of memory ; GPU=0 ; hostname=compute-infer-1 ; expr=cudaMalloc((void**)&p, size);      

2023-03-27 00:09:56.906687292 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/down_blocks.2/downsamplers.0/conv/Conv' Status Message: /onnxruntime_src/onnxrun
time/core/providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::
conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*
, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=compute-infer-1 ; expr=cudaMallo
c((void**)&p, size);                                                                                                                                                                                                                   
      

[2023-03-27 00:10:00,082] INFO: 399712 140298697029184 werkzeug: 10.2.2.16 - - [27/Mar/2023 00:10:00] "GET /api/status HTTP/1.1" 200 -                                                                                                 
[2023-03-27 00:10:00,871] TRACE: 399712 140298722207296 onnx_web.worker.pool: checking in from progress worker thread                                                                                                                  
[2023-03-27 00:10:00,871] TRACE: 399712 140298722207296 onnx_web.worker.pool: empty queue in progress worker for device cuda                                                                                                           
[2023-03-27 00:10:00,871] DEBUG: 399712 140298722207296 onnx_web.worker.pool: enqueueing next job for idle worker                                                                                                                      
[2023-03-27 00:10:00,871] TRACE: 399712 140298722207296 onnx_web.worker.pool: no pending jobs for device cuda                                                                                                                          
[2023-03-27 00:10:01,711] DEBUG: 399712 140298697029184 onnx_web.worker.pool: checking status for finished job: txt2img_0_370d011ea481180c4f5214e8a104c5ea4334affbb5e8d1ffb867887a4d49b84e_1679875795_0.png                            
[2023-03-27 00:10:01,711] INFO: 399712 140298697029184 werkzeug: 10.2.2.16 - - [27/Mar/2023 00:10:01] "GET /api/ready?output=txt2img_0_370d011ea481180c4f5214e8a104c5ea4334affbb5e8d1ffb867887a4d49b84e_1679875795_0.png HTTP/1.1" 200 
-                                                                                                                                                                                                                                      
[2023-03-27 00:10:01,716] INFO: 399712 140298697029184 werkzeug: 10.2.2.16 - - [27/Mar/2023 00:10:01] "GET /output/txt2img_0_370d011ea481180c4f5214e8a104c5ea4334affbb5e8d1ffb867887a4d49b84e_1679875795_0.png HTTP/1.1" 404 -         
[2023-03-27 00:10:01,721] INFO: 399712 140298697029184 onnx_web.server.params: request from 10.2.2.16: 25 rounds of ddim using ../models/diffusion-openjourney on any device, 512x512, 6.0, 0 - mdjrny-v4 style a giant muffin         
[2023-03-27 00:10:01,721] WARNING: 399712 140298697029184 onnx_web.utils: invalid selection: None                                                                                                                                      
[2023-03-27 00:10:01,721] WARNING: 399712 140298697029184 onnx_web.utils: invalid selection: None                                                                                                                                      
[2023-03-27 00:10:01,721] INFO: 399712 140298697029184 onnx_web.server.api: txt2img job queued for: txt2img_0_c965bc9bdd893107021237367f1ecf4d694f500740dcf8b8060d67d80de9c573_1679875801_0.png                                        
[2023-03-27 00:10:01,721] TRACE: 399712 140298697029184 onnx_web.worker.pool: jobs queued by device: [(0, 2)]

ssube added this to the v0.9 milestone Mar 26, 2023

ssube added status/progress issues that are in progress and have a branch type/bug broken features status/fixed issues that have been fixed and released and removed status/progress issues that are in progress and have a branch labels Mar 26, 2023

ssube closed this as completed Mar 27, 2023

ssube added a commit that referenced this issue Mar 27, 2023

fix(tests): make release tests fail if image was not successful (#287)

d7e5480

ssube mentioned this issue Mar 27, 2023

v0.9.0 release checklist #261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release tests can pass after image 404 #287

release tests can pass after image 404 #287

ssube commented Mar 26, 2023

ssube commented Mar 27, 2023

release tests can pass after image 404 #287

release tests can pass after image 404 #287

Comments

ssube commented Mar 26, 2023

ssube commented Mar 27, 2023