-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support for configure magic on Spark Python Kubernetes Kernels (WIP) #1105
base: main
Are you sure you want to change the base?
[Feature] Support for configure magic on Spark Python Kubernetes Kernels (WIP) #1105
Conversation
for more information, see https://pre-commit.ci
response = requests.post( | ||
self.update_kernel_url, | ||
data=json.dumps(payload_dict, default=str), | ||
headers=self.headers, | ||
verify=False, | ||
) |
Check failure
Code scanning / CodeQL
Request without certificate validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should address this alert somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure...we can discuss the approach to resolve this.
logger.debug(f"Payload to refresh: {magic_payload}") | ||
result = self.update_kernel(magic_payload) | ||
return result | ||
return "Done" |
Check warning
Code scanning / CodeQL
Unreachable code
else: | ||
logger.error(f"Either key or value is not defined. {env_key}, {env_value}") | ||
|
||
def update_kernel(self, payload_dict): |
Check notice
Code scanning / CodeQL
Explicit returns mixed with implicit (fall through) returns
@@ -0,0 +1,296 @@ | |||
import base64 | |||
import json |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from'
@@ -0,0 +1,296 @@ | |||
import base64 |
Check notice
Code scanning / CodeQL
Unused import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really interesting @rahul26goyal - thank you. I'm not sure we should hold the 3.0 release for this as this seems additive. Most of the comments are little things.
One thing I'm not sure about is that this only applies to Python kernels, and only those configured for Spark (at the moment, although it could be more general).
I'd also like to look into incorporating the RemoteZMQChannelsHandler
in a general way. I haven't given it a real close look, but do you think that could be useful even outside this particular "configure/restart" feature?
payload = self.get_json_body() | ||
self.log.debug(f"Request payload: {payload}") | ||
if payload is None: | ||
self.log.info("Empty payload in the request body.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These info
messages aren't necessary since the message returned to the client will indicate where it came from.
self.log.info("Empty payload in the request body.") | ||
self.finish( | ||
json.dumps( | ||
{"message": "Empty payload received. No operation performed on the Kernel."}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"message": "Empty payload received. No operation performed on the Kernel."}, | |
{"message": f"Empty payload received. No operation performed on kernel: {kernel_id}"}, |
self.log.info("Empty payload in the request body.") | ||
self.finish( | ||
json.dumps( | ||
{"message": "Empty payload received. No operation performed on the Kernel."}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"message": "Empty payload received. No operation performed on the Kernel."}, | |
{"message": f"Empty payload received. No operation performed on kernel: {kernel_id}"}, |
"An existing restart request is still in progress. Skipping this request." | ||
) | ||
raise web.HTTPError( | ||
400, "Duplicate Kernel update request received for Id: %s." % kernel_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
400, "Duplicate Kernel update request received for Id: %s." % kernel_id | |
400, "Duplicate configure kernel request received for kernel: %s." % kernel_id |
"An existing restart request is still in progress. Skipping this request." | ||
) | ||
raise web.HTTPError( | ||
400, "Duplicate Kernel update request received for Id: %s." % kernel_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
400, "Duplicate Kernel update request received for Id: %s." % kernel_id | |
400, "Duplicate configure kernel request received for kernel: %s." % kernel_id |
response = requests.post( | ||
self.update_kernel_url, | ||
data=json.dumps(payload_dict, default=str), | ||
headers=self.headers, | ||
verify=False, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should address this alert somehow.
@kevin-bates :
we need use this term in both logs and response messages. |
for more information, see https://pre-commit.ci
except ValueError as ve: | ||
logger.exception(f"Could not parse JSON object from input {cell}: error: {ve}.") | ||
return ConfigureMagic.INVALID_JSON_PAYLOAD | ||
except JSONDecodeError as jde: |
Check failure
Code scanning / CodeQL
Unreachable 'except' block
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I guess refresh seems a little easier to understand than reconfigure. Does this imply the magic name would change to I would also like to see the endpoint be under |
Hi @rahul26goyal - what is the status of this PR since it's been about 6 weeks since its last update and it seems there are a few things still to work out? |
Problem Statement
With JEG running on a remote machine and handling the kernel life cycle, Notebook users can not longer change the Kernels specs / properties locally which would update the configuration with which Spark kernel comes up. There are various use cases where users want to play around and experiment with different spark configuration to arrive at the final configs which best suit their workload. These configs also might vary from one notebook to another based on the workload the notebook is doing. JEG is also used as we multi-tenant service where each user might want to tweak the kernel based on his/ her scenario.
Thus, there is a need for users to be able to update the kernel / spark properties at runtime from the notebook.
Feature Description
The changes proposed in this PR are to add support for a well known magic
%%configure -f {}
which allows Notebook users to change the spark properties at runtime without having to create / update any kernel spec file. This would allow users to change spark driver, executor resources (like cores, memory), enable / disable spark configuration etc.Example: The below snipped can be copied into a notebook cell to update the various spark properties associated with the current kernel.
Implementation Details
The below are the changes made at the high level:
POST api/configure/<kernel_id>
which accepts a payload similar to create kernel API. This API currently support updating the["KERNEL_EXTRA_SPARK_OPTS", "KERNEL_LAUNCH_TIMEOUT"]
env variables.kernel_id
same and want to give a smooth end user experience.exec_reply
message and to mark the kernel idle, we need to kernelstatus=idle
messages etc . These messages are pre-generated on the kernel and sent to JEG while making the API call to refresh the kernel.I will update more details about the changes and add some diagrams.
Testing
Note
Opening this PR for some early feedback and discussion on the changes.