You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When 1 remote kernel has stopped no files are displayed in the Files tab -> sessions REST API returns total failure as long as just 1 remote kernel API fails
#5057
Closed
stevehaertel opened this issue
Nov 15, 2019
· 3 comments
notebook = 6.0.2 (but same thing happens on 6)
jupyter enterprise gateway = 2.0.0
Problem
When I use Jupyter to launch any number of Spark kernels, if the Spark application is stopped outside of Jupyter, upon logging in, 0 Files are displayed in the Files tab. If I take a look at my networking tab in my browser, I can see that the "sessions" REST API call is failing. I'm not exactly sure what the sessions API is doing (hopefully you can help!) but based on my JEG log output, it looks like Jupyter is calling JEG REST APIs to get info for each of the kernels. If just 1 of those kernel API calls fails, then the entire sessions REST API returns a 504 ({"message": "Error attempting to connect to Gateway server url 'https://[hostname]:8888'. Ensure gateway url is valid and the Gateway instance is running.", "reason": null})
Question
Would it be possible to return the partial list of kernels that it CAN find instead of an entire failure?
JEG log where you can see the calls that Jupyter is doing for multiple kernels
Starting IPython kernel for Spark Cluster mode on behalf of user shaertel
[I 2019-11-15 12:50:35.284 EnterpriseGatewayApp] ApplicationID: 'app-20191115125034-0007-0cff530c-4325-4688-b204-c0229fd2869a' assigned for KernelID: '8cc66e44-8238-4454-9bb5-2a0cf0074ebe', state: WAITING, 14.0 seconds after starting.
[I 2019-11-15 12:50:35.341 EnterpriseGatewayApp] Kernel started: 8cc66e44-8238-4454-9bb5-2a0cf0074ebe
[I 191115 12:50:35 web:2246] 201 POST /api/kernels (9.21.58.126) 14017.63ms
[I 191115 12:50:35 web:2246] 200 GET /api/kernels/8cc66e44-8238-4454-9bb5-2a0cf0074ebe (9.21.58.126) 2.50ms
[I 191115 12:50:35 web:2246] 200 GET /api/kernels/8cc66e44-8238-4454-9bb5-2a0cf0074ebe (9.21.58.126) 0.72ms
[W 2019-11-15 12:50:35.456 EnterpriseGatewayApp] No session ID specified
[I 191115 12:50:35 web:2246] 101 GET /api/kernels/8cc66e44-8238-4454-9bb5-2a0cf0074ebe/channels (9.21.58.126) 14.12ms
[I 2019-11-15 12:50:42.620 EnterpriseGatewayApp] KernelRestarter: restarting kernel (1/5), keep random ports
[W 2019-11-15 12:50:42.621 EnterpriseGatewayApp] Remote kernel (d389b3c6-a72b-4865-821d-974a7bcccf06) will not be automatically restarted since there are no clients connected at this time.
[I 2019-11-15 12:50:42.746 EnterpriseGatewayApp] Kernel shutdown: d389b3c6-a72b-4865-821d-974a7bcccf06
[I 2019-11-15 12:50:46.326 EnterpriseGatewayApp] Starting buffering for 8cc66e44-8238-4454-9bb5-2a0cf0074ebe:4cf9dc54-4f65b5bdedd2ae520723a69c
[I 191115 12:50:49 web:2246] 200 GET /api/kernelspecs (9.21.58.126) 11.16ms
[W 191115 12:50:49 web:1782] 404 GET /api/kernels/d389b3c6-a72b-4865-821d-974a7bcccf06 (9.21.58.126): Kernel does not exist: d389b3c6-a72b-4865-821d-974a7bcccf06
[W 191115 12:50:49 web:2246] 404 GET /api/kernels/d389b3c6-a72b-4865-821d-974a7bcccf06 (9.21.58.126) 3.34ms
The text was updated successfully, but these errors were encountered:
Hi @stevehaertel. This is a bizarre day as its the second occurrence (see #5055) of a gateway-related issue that should have been witnessed before and leads me to believe there's been some kind of change or something has side affected things such that these issues are now surfacing. That said, I don't tend to have kernel issues or let culling occur very often, so perhaps this is just a humble reminder. 😄
On the bright side, if I run with the updated file in #5055, I don't see this issue on my Notebook.
I can reproduce your issue after a kernel has been culled (which may be a similar scenario in these failing cases you have). After culling, the /api/sessions request, which ultimately hits the EG server to collect the running kernel models, fails but due to the error handling (fixed in #5055), causes the request from the browser to fail (I presume, I'm not a front-end person). Since the directory listing always follows the /api/sessions request the contents request is not satisfied and, thus, the Files tab is empty. Here are the two NB log entries from my system - when /api/sessions succeeds ...
[D 12:09:41.057 NotebookApp] 200 GET /api/sessions?_=1573848580376 (::1) 634.28ms
[D 12:09:41.069 NotebookApp] 200 GET /api/contents/alice/YARN?type=directory&_=1573848580378 (::1) 6.42ms
Based on the output in your EG log, it looks like your kernels are failing to start in your Spark cluster. I'd be happy to help with those issues in either the EG gitter channel or via an issue in the EG repo - if you like.
@kevin-bates Hey it worked! :D In my test, after I manually kill 1 running spark app, I go back into my notebook, and I can see both files there, and I can go into the kernel that I had originally stopped and go ahead and start another one with no problem :) @shuichiro-makigaki うまく行った!ありがとうございました
Environment:
Linux [hostname] 2.6.32-754.23.1.el6.x86_64 #1 SMP Tue Sep 17 09:46:55 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux
notebook = 6.0.2 (but same thing happens on 6)
jupyter enterprise gateway = 2.0.0
Problem
When I use Jupyter to launch any number of Spark kernels, if the Spark application is stopped outside of Jupyter, upon logging in, 0 Files are displayed in the Files tab. If I take a look at my networking tab in my browser, I can see that the "sessions" REST API call is failing. I'm not exactly sure what the sessions API is doing (hopefully you can help!) but based on my JEG log output, it looks like Jupyter is calling JEG REST APIs to get info for each of the kernels. If just 1 of those kernel API calls fails, then the entire sessions REST API returns a 504 ({"message": "Error attempting to connect to Gateway server url 'https://[hostname]:8888'. Ensure gateway url is valid and the Gateway instance is running.", "reason": null})
Question
Would it be possible to return the partial list of kernels that it CAN find instead of an entire failure?
JEG log where you can see the calls that Jupyter is doing for multiple kernels
Starting IPython kernel for Spark Cluster mode on behalf of user shaertel
The text was updated successfully, but these errors were encountered: